Hadoop 2.0: New Big Data Possibilities - InformationWeek
Data Management // Software Platforms
11:28 AM
Doug Henschen
Doug Henschen
Connect Directly
Moving UEBA Beyond the Ground Floor
Sep 20, 2017
This webinar will provide the details you need about UEBA so you can make the decisions on how bes ...Read More>>

Hadoop 2.0: New Big Data Possibilities

Hadoop 2.0 will move beyond batch processing to support interactive, online and streaming applications. But don't let warnings about YARN tie you up in knots.

You could use YARN to allocate resources to Impala, for example, but then Enterprise RTQ would manage the concurrent queries running inside Impala. "If these queries went directly to YARN alone -- and you can look at how it works with Hive and Stinger, for example -- you no longer have two systems managing resources on the cluster and you don't have to reinvent everything from multi-tenancy to security and all of those things," Murthy says.

This is a Hortonworks take on Hadoop, as that company is sticking with Hive and working on project Stinger as a way to drive faster query performance (45X in recent tests, according to Murthy). Impala, HAWQ and other SQL-on-Hadoop projects are offering alternatives to Hive that don't rely on MapReduce running behind the scenes (as is the case with Hive). With last month's release of Cloudera Impala, company CEO Mike Olson said of Hive "we don't believe that it's going to be possible to drive down latencies and improve performance sufficiently via that platform."

This is a side issue that doesn't take anything away from support of Hadoop 2.0 or the value of YARN. We've asked Cloudera, MapR and others for their positions on YARN, and thus far the statements of support are universal.

Cloudera is not only contributing to YARN development and shipping a preview version in it's software distribution, according to Charles Zedlewski, VP of products, it's also "undertaking some developments in Impala to better take advantage of YARN. With the way Impala was designed there is no overlapping resource management or security."

MapR is "working with the community to enhance YARN and make it more valuable," said MapR VP of Marketing Jack Norris. "For example, we are the primary contributors to Apache Drill, which is the YARN-based SQL-on-Hadoop solution." (Drill is in development and will be MapR's answer to Cloudera Impala and Stinger-improved Hive).

Customers will inevitably rule the outcome if there is any debate. If most organizations are determined to control Hadoop resources using YARN, it will be easy enough for commercial tools to be rearchitected to defer to YARN resource management controls. If proprietary tools offer some measure of added value, we just might see overlapping administrative controls. It won't be the first time.

The main point on the pending release of Hadoop and YARN is that the platform is maturing, Murthy maintains. It's a step that will help move the platform beyond the early Web-company implementations into the diverse demands of the enterprise market.

"YARN makes it easy for a lot of applications to come into the Hadoop ecosystem and it gives you a significantly better return on your Hadoop cluster," he says. "You can manage applications one way, operate one way, monitor one way and drive down the cost of running your entire data architecture."

2 of 2
Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
D. Henschen
D. Henschen,
User Rank: Author
5/22/2013 | 2:17:36 PM
re: Hadoop 2.0: New Big Data Possibilities
I had questions about the timing of Hadoop 2.0 and YARN, but the response from Arun Murthy came in too late for publication. It's the beta version that will be announced within a matter of days. When will it reach GA? The short answer is the second half of 2013 and into 2014, but here's a Murthy's statement with more detail:

"Apache Hadoop 2.0 and YARN have been under development for 2.5 to 3 years, will be reaching final Beta shortly, with a push to final stable release within the Apache community a matter of weeks after that. At that point, MapReduce (batch data processing) and Apache Tez (interactive data processing) will be two application types that are fully tested to run on YARN. Community projects such as S4, Storm, Giraph, OpenMPI and other open source projects have been doing work to be first-class YARN applications as well, so they will now have a stable platform release to test against and finish their efforts. Commercial vendors and startups have also been doing work around YARN. For example, Continuuity is a startup that created an open source framework called Weave that makes it easy to create YARN applications.

Bottom-line: the next wave of innovation on top of YARN has been underway for a while. How long will it take for the market to adopt Hadoop 2.0?GǪ Initial uptake of Hadoop 2.0 based solutions with YARN will start in the second half of 2013 with broader market adoption happening throughout 2014."
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll