Software // Information Management
News
11/7/2011
11:20 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Hadoop Spurs Big Data Revolution

Open source data processing platform has won over Web giants for its low cost, scalability, and flexibility. Now Hadoop will make its way into more enterprises.

What's Ahead?

Companies already using Hadoop invariably have bigger plans. AOL is moving critical applications to its 700-node production environment, which is described as a highly reliable and controlled deployment, providing data down to granular levels of detail. The 300-node R&D environment is where many of company's most advanced Ph.D. analytics experts work on cutting-edge projects. Cloudera provides the enterprise support for both deployments, helping AOL with bug fixes, software upgrades, and service problems.

At ComScore, it will be several months before Hadoop can scale up and replace its data processing grid, Brown says. That move was delayed in part because ComScore switched from Cloudera's Hadoop distribution to MapR's, which ComScore licensed through EMC Greenplum. MapR's version of Hadoop will let ComScore switch from HDFS to the more mature and widely used Network File System. NFS will enable the company to easily move data back and forth among Hadoop, Sybase IQ, and other data sources and systems, something it couldn't do with HDFS, Brown says.

EMC and partner MapR introduced new Hadoop software and support options this spring, as did IBM with its BigInsights offering. IBM partner Karmasphere, which provides Hadoop development and analytics tools, recently introduced a virtual appliance for BigInsights, designed to speed development of MapReduce jobs and related analytics projects. Microsoft has promised a Windows Server-friendly distribution of Hadoop supported by Yahoo spin-off Hortonworks, another enterprise-focused Hadoop tools and support provider. It's a safe bet that Oracle, too, will find ways to differentiate its Hadoop offering beyond the promised delivery of the Oracle Big Data Appliance.

Only the largest vendors have had the chutzpa to announce their own Hadoop software distributions and support plans. But dozens of others have added integrations and support tools, so they can move data into and out of Hadoop and analyze data sets after they're boiled down by MapReduce processing. That list includes data warehouse vendors Hewlett-Packard, ParAccel, and Teradata; data integration vendors Informatica, Pervasive, Talend, and Syncsort; and business intelligence and analytics vendors Jaspersoft, Pentaho, and SAS.

The latest wave of Hadoop announcements is coming from application developers and service providers. Amazon has offered a Hadoop-based service on its Elastic Compute Cloud since 2009. IBM launched a BigInsights service on its SmartCloud Enterprise platform in October. And Microsoft is promising a beta Hadoop-based service on the SQL Azure cloud platform by year's end.

Hadoop's Many Pieces
Hadoop Subprojects
Hadoop Common Common utilities that support the other Hadoop subprojects
Hadoop Distributed File System Distributed file system that provides high-throughput access to application data
Hadoop MapReduce Software framework for distributed processing of large data sets on compute clusters
Other Hadoop-Related Apache Projects
Chukwa Data-collection system for managing large distributed systems
HBase Scalable, distributed database that supports structured data storage for large tables
Hive Data warehouse infrastructure that provides data summarization and ad hoc querying
Mahout Scalable machine learning and data mining library
Pig High-level data-flow language and execution framework for parallel computing
ZooKeeper High-performance coordination service for distributed applications
Data: Apache Software Foundation
SunGard plans to launch a Hadoop-based managed service that will let customers run MapReduce jobs. No word on when, but CTO Indu Kodukula says the company will run MapR software on EMC Greenplum's modular appliance. It will aim the service at customers that expect to operate 100 TB or more of data but aren't ready to commit to building out their own infrastructure to support Hadoop.

"Most of the requests that we've received to support Hadoop come from large financial customers that have an enormous amount of data and interest in blending in external sources, but they don't entirely know whether the results are going to be meaningful," Kodukula says. Rather than spending first and risking failure, they'd rather experiment with a managed service, he says.

On the apps front, Tidemark introduced an innovative cloud-based performance management application in October built on an "elastic computation grid based on in-memory technology coupled with Hadoop MapReduce processing." That's a mouthful, but it's simpler than it sounds. The in-memory technology is used for the fast analyses you expect in a performance management app (think Cognos TM1, QlikTech, SAP Hana, and Tibco Spotfire-style financial analyses delivered via the cloud). The Hadoop MapReduce part speeds answers to big data problems and blends mixed data types that might not conform to a fixed schema.

Tidemark customer U.S. Sugar, for example, is mixing weather data with the information it gets from growers related to seeds, chemical treatments, and acres planted to better understand and predict crop production. And Acosta, a marketing services firm that works with consumer products companies, is analyzing consumer sentiments expressed in social media to do a better job of stocking products in support of marketing campaigns.

All this support for Hadoop will naturally encourage broader experimentation and is likely to boost adoption. According to a recent InformationWeek survey of 431 business technology professionals involved with information management tools, only about 3% have made extensive use of Hadoop or other NoSQL platforms while 11% have made limited use of it (see chart, below). With all the hype around Hadoop, those figures should begin to rise.

Chart Limited Hadoop Use --So Far

It may be that we're at the apex of Gartner's hype cycle, so beware the trough of disillusionment in the months ahead. For one thing, expect a cacophony of confusing commercial messages. Customer success stories and emerging applications will be the best way to guage Hadoop's progress.

Once Hadoop is proven and mission critical, as it is at AOL, its use will be as routine and accepted as SQL and relational databases are today. It's the right tool for the job when scalability, flexibility, and affordability really matter. That's what all the Hadoopla is about.

Read the sidebar:
Hadoop's Flexibility Wins Over Online Data Provider

Previous
5 of 5
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/15/2012 | 10:10:09 PM
re: Hadoop Spurs Big Data Revolution
That's per core, but these stats have all been surpassed with the latest hardware.
molloy
50%
50%
molloy,
User Rank: Apprentice
12/6/2011 | 4:08:09 AM
re: Hadoop Spurs Big Data Revolution
Reading through the whole document I see only one mention of Yahoo, and no mention of Yahoo as the originator of Hadoop. It sometimes appears that the Press is intent on highlighting all of Yahoo's weaknesses, and none of it's strengths. Perhaps you think this information is already well-known, but the pie-chart showing that 74% have "no current or planned use" would suggest otherwise. For those who wish to read more meaty detail, see http://developer.yahoo.com/had....
IKODUKULA945
50%
50%
IKODUKULA945,
User Rank: Apprentice
12/4/2011 | 8:59:29 PM
re: Hadoop Spurs Big Data Revolution
Matspca - we're working on establishing a benchmark for Hadoop. If you'd like to participate, please let me know at indu.kodukula@sungard.com
matspca
50%
50%
matspca,
User Rank: Apprentice
11/30/2011 | 11:01:45 PM
re: Hadoop Spurs Big Data Revolution
Not everyone believes in the Hype of Hadoop. See http://www.vertica.com/2011/09... The big organizations mentioned here can afford to use non optimal solutions. I have seen no benchmark showing Hadoop beating say Oracle. My own noSQL database beats Hadoop by a large margin using $330 PC verses $1 million (or so) used by Hadoop for the same benchmark. See http://www.velocitydb.com/Comp...

I will continue following the Hype of Hadoop and if there really is some substance behind it then I look forward to a .NET version of the distribution mechanism.
RodneyG79
50%
50%
RodneyG79,
User Rank: Apprentice
11/10/2011 | 8:48:32 PM
re: Hadoop Spurs Big Data Revolution
128 MB of RAM for 16 cores? That has to be typo.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 16, 2014.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.