Companies already using Hadoop invariably have bigger plans. AOL is moving critical applications to its 700-node production environment, which is described as a highly reliable and controlled deployment, providing data down to granular levels of detail. The 300-node R&D environment is where many of company's most advanced Ph.D. analytics experts work on cutting-edge projects. Cloudera provides the enterprise support for both deployments, helping AOL with bug fixes, software upgrades, and service problems.
At ComScore, it will be several months before Hadoop can scale up and replace its data processing grid, Brown says. That move was delayed in part because ComScore switched from Cloudera's Hadoop distribution to MapR's, which ComScore licensed through EMC Greenplum. MapR's version of Hadoop will let ComScore switch from HDFS to the more mature and widely used Network File System. NFS will enable the company to easily move data back and forth among Hadoop, Sybase IQ, and other data sources and systems, something it couldn't do with HDFS, Brown says.
EMC and partner MapR introduced new Hadoop software and support options this spring, as did IBM with its BigInsights offering. IBM partner Karmasphere, which provides Hadoop development and analytics tools, recently introduced a virtual appliance for BigInsights, designed to speed development of MapReduce jobs and related analytics projects. Microsoft has promised a Windows Server-friendly distribution of Hadoop supported by Yahoo spin-off Hortonworks, another enterprise-focused Hadoop tools and support provider. It's a safe bet that Oracle, too, will find ways to differentiate its Hadoop offering beyond the promised delivery of the Oracle Big Data Appliance.
Only the largest vendors have had the chutzpa to announce their own Hadoop software distributions and support plans. But dozens of others have added integrations and support tools, so they can move data into and out of Hadoop and analyze data sets after they're boiled down by MapReduce processing. That list includes data warehouse vendors Hewlett-Packard, ParAccel, and Teradata; data integration vendors Informatica, Pervasive, Talend, and Syncsort; and business intelligence and analytics vendors Jaspersoft, Pentaho, and SAS.
The latest wave of Hadoop announcements is coming from application developers and service providers. Amazon has offered a Hadoop-based service on its Elastic Compute Cloud since 2009. IBM launched a BigInsights service on its SmartCloud Enterprise platform in October. And Microsoft is promising a beta Hadoop-based service on the SQL Azure cloud platform by year's end.
"Most of the requests that we've received to support Hadoop come from large financial customers that have an enormous amount of data and interest in blending in external sources, but they don't entirely know whether the results are going to be meaningful," Kodukula says. Rather than spending first and risking failure, they'd rather experiment with a managed service, he says.
On the apps front, Tidemark introduced an innovative cloud-based performance management application in October built on an "elastic computation grid based on in-memory technology coupled with Hadoop MapReduce processing." That's a mouthful, but it's simpler than it sounds. The in-memory technology is used for the fast analyses you expect in a performance management app (think Cognos TM1, QlikTech, SAP Hana, and Tibco Spotfire-style financial analyses delivered via the cloud). The Hadoop MapReduce part speeds answers to big data problems and blends mixed data types that might not conform to a fixed schema.
Tidemark customer U.S. Sugar, for example, is mixing weather data with the information it gets from growers related to seeds, chemical treatments, and acres planted to better understand and predict crop production. And Acosta, a marketing services firm that works with consumer products companies, is analyzing consumer sentiments expressed in social media to do a better job of stocking products in support of marketing campaigns.
All this support for Hadoop will naturally encourage broader experimentation and is likely to boost adoption. According to a recent InformationWeek survey of 431 business technology professionals involved with information management tools, only about 3% have made extensive use of Hadoop or other NoSQL platforms while 11% have made limited use of it (see chart, below). With all the hype around Hadoop, those figures should begin to rise.
It may be that we're at the apex of Gartner's hype cycle, so beware the trough of disillusionment in the months ahead. For one thing, expect a cacophony of confusing commercial messages. Customer success stories and emerging applications will be the best way to guage Hadoop's progress.
Once Hadoop is proven and mission critical, as it is at AOL, its use will be as routine and accepted as SQL and relational databases are today. It's the right tool for the job when scalability, flexibility, and affordability really matter. That's what all the Hadoopla is about.
Hadoop's Flexibility Wins Over Online Data Provider