Hadoop 2.0 Goes GA: New Workloads Await - InformationWeek
Data Management // Big Data Analytics
12:54 PM
Connect Directly

Hadoop 2.0 Goes GA: New Workloads Await

Apache's Hadoop upgrade, now in general availability, goes beyond MapReduce and promises better options for SQL-style querying, graph analysis and stream processing.

Standard advice in public speaking is to tell the audience what you're going to tell them, then tell them and, finally, tell them what you told them.

So it is in open source software release cycles. The Apache Foundation and its self-appointed surrogate, Hortonworks, have been telling us what's coming in Hadoop 2.0. They recently told us again that the release was imminent. On Wednesday came the announcement that it's finally here, meaning generally available for download.

Do you need to hear, yet again, what's new in Hadoop 2.0? The big new piece is YARN (a mangled acronym for Yet Another Resource Manager), a cluster resource management layer that will enable Hadoop to handle much more than batch-oriented MapReduce jobs. With YARN you can assign cluster capacity accordingly in order to meet the service level demands of particular workloads.

Thus, in Hadoop 2.0, big, resource-sucking MapReduce jobs can co-exist with HBase workloads and Hive queries, for example. In the Hadoop 1.0 world, companies often deployed separate clusters for HBase and MapReduce work in order to avoid system contention.

[ Want more on Teradata's alternative for many Hadoop workloads? Read Teradata Brings Graph Analysis To SQL. ]

YARN also promises better support for a host of emerging Hadoop workloads, including Storm, a stream-processing platform developed by Twitter, Apache Giraph, the open source graph analysis engine, and Spark, a tool for in-memory analytics on top of Hadoop. Storm recently officially became an Apache open source project, and Hortonworks announced Monday that it will make a preview Storm integration available in Q4 to be followed by a general release in the Hortonworks Data Platform in Q1 of 2014.

"One of the most common use cases that we see emerging from our customers is ... stream processing in Hadoop," wrote Bob Page, VP of products at Hortonworks, in a blog this week. "Early adopters are using stream processing to analyze some of the most common new types of data such as sensor and machine data in real time."

Hadoop 2.0 will also better support SQL-on-Hadoop options, though each Hadoop distributor seems to have its own prescription for how best to handle that big demand. Cloudera's answer is Impala. Hortonworks is sticking with Hive, which is supported with new elements of Hadoop 2.0 including the Tez execution engine. IBM has BigSQL, MapR has proposed Apache Drill. Pivotal is promoting its HAWQ technology, which is derived in part from its Greenplum database.

At InformationWeek, we've recently observed that relational database vendors including Oracle and Teradata have been dwelling on the shortcomings of Hadoop, but mostly it's a look backwards at Hadoop 1.0. To tell you again what's coming in Hadoop 2.0, think beyond batch MapReduce toward new, resource-managed workloads including SQL-like querying, HBase NoSQL database operations, Giraph graph analysis and Storm real-time processing.

IT leaders must know the trade-offs they face to get NoSQL's scalability, flexibility and cost savings. Also in the When NoSQL Makes Sense issue of InformationWeek: Oregon's experience building an Obamacare exchange. (Free registration required.)

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll