Big Data // Big Data Analytics
News
10/16/2013
12:54 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Hadoop 2.0 Goes GA: New Workloads Await

Apache's Hadoop upgrade, now in general availability, goes beyond MapReduce and promises better options for SQL-style querying, graph analysis and stream processing.

Standard advice in public speaking is to tell the audience what you're going to tell them, then tell them and, finally, tell them what you told them.

So it is in open source software release cycles. The Apache Foundation and its self-appointed surrogate, Hortonworks, have been telling us what's coming in Hadoop 2.0. They recently told us again that the release was imminent. On Wednesday came the announcement that it's finally here, meaning generally available for download.

Do you need to hear, yet again, what's new in Hadoop 2.0? The big new piece is YARN (a mangled acronym for Yet Another Resource Manager), a cluster resource management layer that will enable Hadoop to handle much more than batch-oriented MapReduce jobs. With YARN you can assign cluster capacity accordingly in order to meet the service level demands of particular workloads.

Thus, in Hadoop 2.0, big, resource-sucking MapReduce jobs can co-exist with HBase workloads and Hive queries, for example. In the Hadoop 1.0 world, companies often deployed separate clusters for HBase and MapReduce work in order to avoid system contention.

[ Want more on Teradata's alternative for many Hadoop workloads? Read Teradata Brings Graph Analysis To SQL. ]

YARN also promises better support for a host of emerging Hadoop workloads, including Storm, a stream-processing platform developed by Twitter, Apache Giraph, the open source graph analysis engine, and Spark, a tool for in-memory analytics on top of Hadoop. Storm recently officially became an Apache open source project, and Hortonworks announced Monday that it will make a preview Storm integration available in Q4 to be followed by a general release in the Hortonworks Data Platform in Q1 of 2014.

"One of the most common use cases that we see emerging from our customers is ... stream processing in Hadoop," wrote Bob Page, VP of products at Hortonworks, in a blog this week. "Early adopters are using stream processing to analyze some of the most common new types of data such as sensor and machine data in real time."

Hadoop 2.0 will also better support SQL-on-Hadoop options, though each Hadoop distributor seems to have its own prescription for how best to handle that big demand. Cloudera's answer is Impala. Hortonworks is sticking with Hive, which is supported with new elements of Hadoop 2.0 including the Tez execution engine. IBM has BigSQL, MapR has proposed Apache Drill. Pivotal is promoting its HAWQ technology, which is derived in part from its Greenplum database.

At InformationWeek, we've recently observed that relational database vendors including Oracle and Teradata have been dwelling on the shortcomings of Hadoop, but mostly it's a look backwards at Hadoop 1.0. To tell you again what's coming in Hadoop 2.0, think beyond batch MapReduce toward new, resource-managed workloads including SQL-like querying, HBase NoSQL database operations, Giraph graph analysis and Storm real-time processing.

IT leaders must know the trade-offs they face to get NoSQL's scalability, flexibility and cost savings. Also in the When NoSQL Makes Sense issue of InformationWeek: Oregon's experience building an Obamacare exchange. (Free registration required.)

Comment  | 
Print  | 
More Insights
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and community news at InformationWeek.com.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.