Software // Information Management
Commentary
10/14/2012
04:50 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

SAS Gets Hip To Hadoop For Big Data

SAS High-Performance Analytic Server heads for Hadoop, bringing SAS data mining, text mining, optimization, and forecasting capabilities beyond the relational database world.

SAS made a slew of announcements at its annual user conference in Las Vegas last week, but none was more important than the news around its High-Performance Analytic (HPA) Server. Of particular importance was the news that HPA will work with Apache Hadoop, the fast-growing big-data processing platform.

HPA isn't currently something that most SAS customers use, much less hope to use in conjunction with Hadoop. But HPA is a cutting-edge product that is crucial to the company's future. Making HPA run on Hadoop is a key step to bringing SAS' vast portfolio of analytic capabilities into the open-source-dominated big data world where data scientists are writing their own algorithms, embracing big-data focused startups, or adapting open-source code written in the R programming language.

SAS is using an agile development approach with HPA so it can quickly expand upon its capabilities and adapt to the pace of the big data world. "We're working on an open-source timeline," Tapan Patel, a SAS product marketing manager told InformationWeek. "The Hadoop and R communities are making so many changes, so we have to adapt."

HPA already runs in the relational world on EMC Greenplum and Teradata. By using the massively parallel processing (MPP) power of these platforms for in-database analysis, analysts can save hours if not days over the old approach of moving data sets out of a data warehouse, analyzing it on a dedicated (but often underpowered) analytic server, and then moving the results back into the data warehouse.

[ Want more on in-database analysis alternatives? Read IBM Answers Oracle Exadata. ]

Other vendors have adopted the in-database approach, including IBM and Oracle as well as EMC Greenplum and Teradata. And these database suppliers are working with SAS and SAS rivals including Alpine Data Labs, Fuzzy Logix, Revolution Analytics, and others to broaden the analytics they can apply within their databases.

SAS was an early pioneer of in-database work, and with HPA it was already supporting predictive analytics and data mining in partnership with EMC and Teradata. With the latest release announced last week (and now shipping), HPA has added text mining, optimization, and forecasting capabilities.

Text mining makes sense of text-rich information such as insurance claims, warranty claims, customer surveys, or the growing streams of customer comments on social networks. Optimization helps retailers and consumer goods makers, among others, with tasks such as setting prices for the best possible balance of strong-yet-profitable sales. Forecasting is used by insurance companies, for example, to estimate exposure or losses in the event of a hurricane or flood.

Where Hadoop is concerned the latest release already runs on the platform, technically, but it's limited to a SAS-customized version of the open source software based on Apache Hadoop v1.0 (also known as version 0.20.20x). SAS says HPA will run on mainstream distributions of Hadoop from the likes of Cloudera, with an upcoming December release of HPA that will based on Apache Hadoop v2.0 (also known as version 0.23).

Whether you're using SAS's current Hadoop software or plan to embrace the v2.0 release, HPA provides a graphical user interface that lets you tap HDFS, MapReduce, Pig, and Hive to apply SAS analyses to the vast data sets residing on Hadoop. MapReduce is the primary model for processing data on Hadoop. Pig is an open source Apache programming tool and language for writing MapReduce jobs. Hive is data warehousing infrastructure built on top of Hadoop that supports data summarization, query, and analysis. HPA also supports Pig and MapReduce code generation, visual editing and syntax checking. Finally SAS Data Integration Studio data transformations and SAS DataFlux data quality routines have also been adapted to Hadoop.

The key question is whether Hadoop practitioners, who may now be used to using open-source and home-grown analytics, will want to bring a commercial product like SAS into what many view as a new computing paradigm.

"We're going open source as a company, so our skill set has had to change over the last three years," says Phil Shelley, vice president and chief technology officer at Sears Holdings. The move started with operating systems, with a move toward Linux, but the change has moved up the stack to the database and analytics level. "Our statistical people used to just use SAS and other [commercial] products, but now we're teaching them to use R on Hadoop," Shelley says.

Cost will certainly be a software selection factor as that's a big reason companies are adopting Hadoop; they're trying to retain and make use of all their data, and they're expecting cost savings over conventional relational databases when scaling out over hundreds of Terabytes or more. Sears, for example, has more than 2 petabytes of data on hand, and until it implemented Hadoop two years ago, Shelley says the company was constantly outgrowing databases and still couldn't store everything on one platform.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
EricLundquist
50%
50%
EricLundquist,
User Rank: Apprentice
10/15/2012 | 7:07:50 PM
re: SAS Gets Hip To Hadoop For Big Data
Hadoop, R and big data analysis engines are reinventing the business intelligence sector. SAS is on top of the trend by embracing it fully rather than half-hearted partnerships. Oracle and Microsoft need to step up their game here imho
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and community news at InformationWeek.com.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.