Home
Doug Henschen

Doug Henschen

Executive Editor, InformationWeek
Executive Editor, InformationWeek

SAS Gets Hip To Hadoop For Big Data

Comments | Doug Henschen, InformationWeek | October 15, 2012 08:30 AM


SAS made a slew of announcements at its annual user conference in Las Vegas last week, but none was more important than the news around its High-Performance Analytic (HPA) Server. Of particular importance was the news that HPA will work with Apache Hadoop, the fast-growing big-data processing platform.

HPA isn't currently something that most SAS customers use, much less hope to use in conjunction with Hadoop. But HPA is a cutting-edge product that is crucial to the company's future. Making HPA run on Hadoop is a key step to bringing SAS' vast portfolio of analytic capabilities into the open-source-dominated big data world where data scientists are writing their own algorithms, embracing big-data focused startups, or adapting open-source code written in the R programming language.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

SAS is using an agile development approach with HPA so it can quickly expand upon its capabilities and adapt to the pace of the big data world. "We're working on an open-source timeline," Tapan Patel, a SAS product marketing manager told InformationWeek. "The Hadoop and R communities are making so many changes, so we have to adapt."

HPA already runs in the relational world on EMC Greenplum and Teradata. By using the massively parallel processing (MPP) power of these platforms for in-database analysis, analysts can save hours if not days over the old approach of moving data sets out of a data warehouse, analyzing it on a dedicated (but often underpowered) analytic server, and then moving the results back into the data warehouse.

[ Want more on in-database analysis alternatives? Read IBM Answers Oracle Exadata. ]

Other vendors have adopted the in-database approach, including IBM and Oracle as well as EMC Greenplum and Teradata. And these database suppliers are working with SAS and SAS rivals including Alpine Data Labs, Fuzzy Logix, Revolution Analytics, and others to broaden the analytics they can apply within their databases.

SAS was an early pioneer of in-database work, and with HPA it was already supporting predictive analytics and data mining in partnership with EMC and Teradata. With the latest release announced last week (and now shipping), HPA has added text mining, optimization, and forecasting capabilities.

Text mining makes sense of text-rich information such as insurance claims, warranty claims, customer surveys, or the growing streams of customer comments on social networks. Optimization helps retailers and consumer goods makers, among others, with tasks such as setting prices for the best possible balance of strong-yet-profitable sales. Forecasting is used by insurance companies, for example, to estimate exposure or losses in the event of a hurricane or flood.

Where Hadoop is concerned the latest release already runs on the platform, technically, but it's limited to a SAS-customized version of the open source software based on Apache Hadoop v1.0 (also known as version 0.20.20x). SAS says HPA will run on mainstream distributions of Hadoop from the likes of Cloudera, with an upcoming December release of HPA that will based on Apache Hadoop v2.0 (also known as version 0.23).

Whether you're using SAS's current Hadoop software or plan to embrace the v2.0 release, HPA provides a graphical user interface that lets you tap HDFS, MapReduce, Pig, and Hive to apply SAS analyses to the vast data sets residing on Hadoop. MapReduce is the primary model for processing data on Hadoop. Pig is an open source Apache programming tool and language for writing MapReduce jobs. Hive is data warehousing infrastructure built on top of Hadoop that supports data summarization, query, and analysis. HPA also supports Pig and MapReduce code generation, visual editing and syntax checking. Finally SAS Data Integration Studio data transformations and SAS DataFlux data quality routines have also been adapted to Hadoop.

The key question is whether Hadoop practitioners, who may now be used to using open-source and home-grown analytics, will want to bring a commercial product like SAS into what many view as a new computing paradigm.

"We're going open source as a company, so our skill set has had to change over the last three years," says Phil Shelley, vice president and chief technology officer at Sears Holdings. The move started with operating systems, with a move toward Linux, but the change has moved up the stack to the database and analytics level. "Our statistical people used to just use SAS and other [commercial] products, but now we're teaching them to use R on Hadoop," Shelley says.

Cost will certainly be a software selection factor as that's a big reason companies are adopting Hadoop; they're trying to retain and make use of all their data, and they're expecting cost savings over conventional relational databases when scaling out over hundreds of Terabytes or more. Sears, for example, has more than 2 petabytes of data on hand, and until it implemented Hadoop two years ago, Shelley says the company was constantly outgrowing databases and still couldn't store everything on one platform.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

COMMENTS

Tune In to BYTE
Facebook Twitter LinkedIn Newsletter RSS
Whitepapers
whitepaper
In this paper you will learn the five trends shaping the future of enterprise mobility. Learn how the rise of social media as a business application, the lurring between work and home, the emergence of new mobile devices, the demand for tech savvy employees and changing expectations of corporate IT will fundamentally change the workplace.
whitepaper
In a survey of more than 1,700 information workers (iWorkers) in North America, notebooks, desktops, and smartphones were found to be “must-have” devices, while tablets, slates, and netbooks were relegated to “nice-to-have” status, according to a commissioned study conducted by Forrester Consulting on behalf of Dell and Intel.
Sponsored by: Dell
Upcoming Events