Hadoop 2.0 Goes GA: New Workloads Await - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
10/16/2013
12:54 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Hadoop 2.0 Goes GA: New Workloads Await

Apache's Hadoop upgrade, now in general availability, goes beyond MapReduce and promises better options for SQL-style querying, graph analysis and stream processing.

Standard advice in public speaking is to tell the audience what you're going to tell them, then tell them and, finally, tell them what you told them.

So it is in open source software release cycles. The Apache Foundation and its self-appointed surrogate, Hortonworks, have been telling us what's coming in Hadoop 2.0. They recently told us again that the release was imminent. On Wednesday came the announcement that it's finally here, meaning generally available for download.

Do you need to hear, yet again, what's new in Hadoop 2.0? The big new piece is YARN (a mangled acronym for Yet Another Resource Manager), a cluster resource management layer that will enable Hadoop to handle much more than batch-oriented MapReduce jobs. With YARN you can assign cluster capacity accordingly in order to meet the service level demands of particular workloads.

Thus, in Hadoop 2.0, big, resource-sucking MapReduce jobs can co-exist with HBase workloads and Hive queries, for example. In the Hadoop 1.0 world, companies often deployed separate clusters for HBase and MapReduce work in order to avoid system contention.

[ Want more on Teradata's alternative for many Hadoop workloads? Read Teradata Brings Graph Analysis To SQL. ]

YARN also promises better support for a host of emerging Hadoop workloads, including Storm, a stream-processing platform developed by Twitter, Apache Giraph, the open source graph analysis engine, and Spark, a tool for in-memory analytics on top of Hadoop. Storm recently officially became an Apache open source project, and Hortonworks announced Monday that it will make a preview Storm integration available in Q4 to be followed by a general release in the Hortonworks Data Platform in Q1 of 2014.

"One of the most common use cases that we see emerging from our customers is ... stream processing in Hadoop," wrote Bob Page, VP of products at Hortonworks, in a blog this week. "Early adopters are using stream processing to analyze some of the most common new types of data such as sensor and machine data in real time."

Hadoop 2.0 will also better support SQL-on-Hadoop options, though each Hadoop distributor seems to have its own prescription for how best to handle that big demand. Cloudera's answer is Impala. Hortonworks is sticking with Hive, which is supported with new elements of Hadoop 2.0 including the Tez execution engine. IBM has BigSQL, MapR has proposed Apache Drill. Pivotal is promoting its HAWQ technology, which is derived in part from its Greenplum database.

At InformationWeek, we've recently observed that relational database vendors including Oracle and Teradata have been dwelling on the shortcomings of Hadoop, but mostly it's a look backwards at Hadoop 1.0. To tell you again what's coming in Hadoop 2.0, think beyond batch MapReduce toward new, resource-managed workloads including SQL-like querying, HBase NoSQL database operations, Giraph graph analysis and Storm real-time processing.

IT leaders must know the trade-offs they face to get NoSQL's scalability, flexibility and cost savings. Also in the When NoSQL Makes Sense issue of InformationWeek: Oregon's experience building an Obamacare exchange. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Enterprise Guide to Digital Transformation
Cathleen Gagne, Managing Editor, InformationWeek,  8/13/2019
Slideshows
IT Careers: How to Get a Job as a Site Reliability Engineer
Cynthia Harvey, Freelance Journalist, InformationWeek,  7/31/2019
Commentary
AI Ethics Guidelines Every CIO Should Read
Guest Commentary, Guest Commentary,  8/7/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll