Big Data // Big Data Analytics
News
6/26/2014
02:46 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Hortonworks Certifies Spark On YARN, Hadoop

Hortonworks catches up to Cloudera with YARN-managed implementation of Spark in-memory framework for machine learning on Hadoop.

Hadoop Jobs: 9 Ways To Get Hired
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)

Hortonworks announced Thursday that Apache Spark, a technology quickly gaining interest for in-memory-accelerated machine learning and other analyses on high-scale data, has been certified to run on Apache YARN, the resource- management layer introduced last year with Apache Hadoop 2.0.

With this milestone, Spark is ready to run as a technology preview on the Hortonworks Data Platform (HDP), which is Hortonworks' Hadoop software distribution. A production-certified release is expected by this fall.

This is not the first appearance of Spark on Hadoop. In February, Cloudera introduced support for Spark using its commercial Cloudera Manager software to deploy, manage, and monitor the software. MapR introduced its own Spark deployment in April. Hortonworks stressed that its approach is 100% open source, using YARN (yet Another Resource Negotiator) to manage and monitor Spark components and workloads alongside other systems and analyses running on Hadoop.

[Want more on Apache Spark? Read MapR Brings Spark In-Memory Analysis To Hadoop.]

"Spark is now natively integrated into Hadoop, so its resources -- CPU, memory, and so on -- can be managed along with the other workloads running on a Hadoop cluster," explained Shaun Connolly, Hortonworks' VP corporate strategy, in an interview with InformationWeek. "That's important to get right because Spark is memory- and CPU-intensive, and you don't want to have to have siloed clusters dedicated to running those workloads."

The whole point of Hadoop 2.0 and YARN is to be able to run multiple workloads -- including Accumulo, Hive, MapReduce, Pig, Storm, Solr, and now, Spark -- against the same data sets, Connolly added.

Asked for comment on Hortonworks' announcement, Cloudera sent InformationWeek the following statement:

"Cloudera developers were the key drivers on YARN support for Spark, leveraging our expertise in YARN as well our developer group on Spark. Cloudera Manager is not orthogonal to YARN support and in fact, Cloudera Manager supports Spark on YARN. Additionally, almost all our customer deployments of Spark today are on top of the YARN framework and we have many customers who are running Spark through us."

Concurrent with Hortonwork's announcement, Spark developer and support provider Databricks announced that Hortonworks is an inaugural member of its Certified Spark Distribution program.

"We're committed to ensuring all Spark users have a terrific experience -- and we're thrilled that Hortonworks shares this vision," said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement. "With the designation of Apache Spark as YARN Ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications."

Customers are now free to download and install the HDP 2.1 Tech Preview Component of Apache Spark on the current HDP 2.0 distribution. Hortonworks expects the HDP 2.1 release, which will include Spark, to be certified for production use "within a handful of months," said Connolly. Hortonworks will support Spark along with the other software included in the distribution.

InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
srowen
50%
50%
srowen,
User Rank: Apprentice
7/4/2014 | 1:49:35 PM
Re: That's nice, but ...
I am from Cloudera and have committed about 50 patches to Spark. Same goes for a few other people here. What are you looking at?
ap2snoopy
50%
50%
ap2snoopy,
User Rank: Apprentice
7/4/2014 | 1:34:53 PM
Re: That's nice, but ...
None of the three 'race horses' (MapR, Cloudera, Hortonworks) seem to have contributed to Spark development.

UCB and Databricks (Spark is their main focus) seem to have the most commiters. 

https://cwiki.apache.org/confluence/display/SPARK/Committers

 

 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/30/2014 | 9:28:57 PM
Spark keeps Hadoop competitive
It looks like there is a healthy competition between these companies that will do much to keep their respective Hadoops systems competitive. MapR Spark, Hortonworks Spark on Yarn and Cloudera Manager's support for Spark are pushing the boundaries of big data.
srowen
50%
50%
srowen,
User Rank: Apprentice
6/26/2014 | 6:23:02 PM
That's nice, but ...
... would be nicer if more than 0 people from HortonWorks made any contribution to Spark. Or you could actually run Spark in production with HDP.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/26/2014 | 4:25:41 PM
Another case of commercial management tool versus open-source management tool
For more on Cloudera's options for implementing Spark, incuding on YARN, click here. Given Cloudera's use of YARN, the key difference between Cloudera and Hortonworks use of Spark seems to boil down to the management software used for deploying, monitoring, and managing the software (YARN does the workloads). In Cloudera's case it's commerical Cloudera Manager software. In Hortonworks' case it's open source Ambari software, but Ambari support is part of what Hortonworks is still working on at this point. Reading between the lines, I would expect HDP 2.1 to become generally available until this fall.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Must Reads Oct. 21, 2014
InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A roundup of the top stories and trends on InformationWeek.com
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.