Big Data // Software Platforms
Commentary
5/21/2013
11:28 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Hadoop 2.0: New Big Data Possibilities

Hadoop 2.0 will move beyond batch processing to support interactive, online and streaming applications. But don't let warnings about YARN tie you up in knots.

You could use YARN to allocate resources to Impala, for example, but then Enterprise RTQ would manage the concurrent queries running inside Impala. "If these queries went directly to YARN alone -- and you can look at how it works with Hive and Stinger, for example -- you no longer have two systems managing resources on the cluster and you don't have to reinvent everything from multi-tenancy to security and all of those things," Murthy says.

This is a Hortonworks take on Hadoop, as that company is sticking with Hive and working on project Stinger as a way to drive faster query performance (45X in recent tests, according to Murthy). Impala, HAWQ and other SQL-on-Hadoop projects are offering alternatives to Hive that don't rely on MapReduce running behind the scenes (as is the case with Hive). With last month's release of Cloudera Impala, company CEO Mike Olson said of Hive "we don't believe that it's going to be possible to drive down latencies and improve performance sufficiently via that platform."

This is a side issue that doesn't take anything away from support of Hadoop 2.0 or the value of YARN. We've asked Cloudera, MapR and others for their positions on YARN, and thus far the statements of support are universal.

Cloudera is not only contributing to YARN development and shipping a preview version in it's software distribution, according to Charles Zedlewski, VP of products, it's also "undertaking some developments in Impala to better take advantage of YARN. With the way Impala was designed there is no overlapping resource management or security."

MapR is "working with the community to enhance YARN and make it more valuable," said MapR VP of Marketing Jack Norris. "For example, we are the primary contributors to Apache Drill, which is the YARN-based SQL-on-Hadoop solution." (Drill is in development and will be MapR's answer to Cloudera Impala and Stinger-improved Hive).

Customers will inevitably rule the outcome if there is any debate. If most organizations are determined to control Hadoop resources using YARN, it will be easy enough for commercial tools to be rearchitected to defer to YARN resource management controls. If proprietary tools offer some measure of added value, we just might see overlapping administrative controls. It won't be the first time.

The main point on the pending release of Hadoop and YARN is that the platform is maturing, Murthy maintains. It's a step that will help move the platform beyond the early Web-company implementations into the diverse demands of the enterprise market.

"YARN makes it easy for a lot of applications to come into the Hadoop ecosystem and it gives you a significantly better return on your Hadoop cluster," he says. "You can manage applications one way, operate one way, monitor one way and drive down the cost of running your entire data architecture."

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
5/22/2013 | 2:17:36 PM
re: Hadoop 2.0: New Big Data Possibilities
I had questions about the timing of Hadoop 2.0 and YARN, but the response from Arun Murthy came in too late for publication. It's the beta version that will be announced within a matter of days. When will it reach GA? The short answer is the second half of 2013 and into 2014, but here's a Murthy's statement with more detail:

"Apache Hadoop 2.0 and YARN have been under development for 2.5 to 3 years, will be reaching final Beta shortly, with a push to final stable release within the Apache community a matter of weeks after that. At that point, MapReduce (batch data processing) and Apache Tez (interactive data processing) will be two application types that are fully tested to run on YARN. Community projects such as S4, Storm, Giraph, OpenMPI and other open source projects have been doing work to be first-class YARN applications as well, so they will now have a stable platform release to test against and finish their efforts. Commercial vendors and startups have also been doing work around YARN. For example, Continuuity is a startup that created an open source framework called Weave that makes it easy to create YARN applications.

Bottom-line: the next wave of innovation on top of YARN has been underway for a while. How long will it take for the market to adopt Hadoop 2.0?G«™ Initial uptake of Hadoop 2.0 based solutions with YARN will start in the second half of 2013 with broader market adoption happening throughout 2014."
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
The weekly wrap-up of the top stories from InformationWeek.com this week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.