Hortonworks Certifies Spark On YARN, Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
6/26/2014
02:46 PM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Hortonworks Certifies Spark On YARN, Hadoop

Hortonworks catches up to Cloudera with YARN-managed implementation of Spark in-memory framework for machine learning on Hadoop.

Hadoop Jobs: 9 Ways To Get Hired
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)

Hortonworks announced Thursday that Apache Spark, a technology quickly gaining interest for in-memory-accelerated machine learning and other analyses on high-scale data, has been certified to run on Apache YARN, the resource- management layer introduced last year with Apache Hadoop 2.0.

With this milestone, Spark is ready to run as a technology preview on the Hortonworks Data Platform (HDP), which is Hortonworks' Hadoop software distribution. A production-certified release is expected by this fall.

This is not the first appearance of Spark on Hadoop. In February, Cloudera introduced support for Spark using its commercial Cloudera Manager software to deploy, manage, and monitor the software. MapR introduced its own Spark deployment in April. Hortonworks stressed that its approach is 100% open source, using YARN (yet Another Resource Negotiator) to manage and monitor Spark components and workloads alongside other systems and analyses running on Hadoop.

[Want more on Apache Spark? Read MapR Brings Spark In-Memory Analysis To Hadoop.]

"Spark is now natively integrated into Hadoop, so its resources -- CPU, memory, and so on -- can be managed along with the other workloads running on a Hadoop cluster," explained Shaun Connolly, Hortonworks' VP corporate strategy, in an interview with InformationWeek. "That's important to get right because Spark is memory- and CPU-intensive, and you don't want to have to have siloed clusters dedicated to running those workloads."

The whole point of Hadoop 2.0 and YARN is to be able to run multiple workloads -- including Accumulo, Hive, MapReduce, Pig, Storm, Solr, and now, Spark -- against the same data sets, Connolly added.

Asked for comment on Hortonworks' announcement, Cloudera sent InformationWeek the following statement:

"Cloudera developers were the key drivers on YARN support for Spark, leveraging our expertise in YARN as well our developer group on Spark. Cloudera Manager is not orthogonal to YARN support and in fact, Cloudera Manager supports Spark on YARN. Additionally, almost all our customer deployments of Spark today are on top of the YARN framework and we have many customers who are running Spark through us."

Concurrent with Hortonwork's announcement, Spark developer and support provider Databricks announced that Hortonworks is an inaugural member of its Certified Spark Distribution program.

"We're committed to ensuring all Spark users have a terrific experience -- and we're thrilled that Hortonworks shares this vision," said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement. "With the designation of Apache Spark as YARN Ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications."

Customers are now free to download and install the HDP 2.1 Tech Preview Component of Apache Spark on the current HDP 2.0 distribution. Hortonworks expects the HDP 2.1 release, which will include Spark, to be certified for production use "within a handful of months," said Connolly. Hortonworks will support Spark along with the other software included in the distribution.

InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
srowen
50%
50%
srowen,
User Rank: Apprentice
7/4/2014 | 1:49:35 PM
Re: That's nice, but ...
I am from Cloudera and have committed about 50 patches to Spark. Same goes for a few other people here. What are you looking at?
ap2snoopy
50%
50%
ap2snoopy,
User Rank: Apprentice
7/4/2014 | 1:34:53 PM
Re: That's nice, but ...
None of the three 'race horses' (MapR, Cloudera, Hortonworks) seem to have contributed to Spark development.

UCB and Databricks (Spark is their main focus) seem to have the most commiters. 

https://cwiki.apache.org/confluence/display/SPARK/Committers

 

 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
6/30/2014 | 9:28:57 PM
Spark keeps Hadoop competitive
It looks like there is a healthy competition between these companies that will do much to keep their respective Hadoops systems competitive. MapR Spark, Hortonworks Spark on Yarn and Cloudera Manager's support for Spark are pushing the boundaries of big data.
srowen
50%
50%
srowen,
User Rank: Apprentice
6/26/2014 | 6:23:02 PM
That's nice, but ...
... would be nicer if more than 0 people from HortonWorks made any contribution to Spark. Or you could actually run Spark in production with HDP.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
6/26/2014 | 4:25:41 PM
Another case of commercial management tool versus open-source management tool
For more on Cloudera's options for implementing Spark, incuding on YARN, click here. Given Cloudera's use of YARN, the key difference between Cloudera and Hortonworks use of Spark seems to boil down to the management software used for deploying, monitoring, and managing the software (YARN does the workloads). In Cloudera's case it's commerical Cloudera Manager software. In Hortonworks' case it's open source Ambari software, but Ambari support is part of what Hortonworks is still working on at this point. Reading between the lines, I would expect HDP 2.1 to become generally available until this fall.
Slideshows
7 Technologies You Need to Know for Artificial Intelligence
Jessica Davis, Senior Editor, Enterprise Apps,  7/1/2019
Commentary
A Practical Guide to DevOps: It's Not that Scary
Cathleen Gagne, Managing Editor, InformationWeek,  7/5/2019
Commentary
Diversity in IT: The Business and Moral Reasons
James M. Connolly, Editorial Director, InformationWeek and Network Computing,  6/20/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
A New World of IT Management in 2019
This IT Trend Report highlights how several years of developments in technology and business strategies have led to a subsequent wave of changes in the role of an IT organization, how CIOs and other IT leaders approach management, in addition to the jobs of many IT professionals up and down the org chart.
Slideshows
Flash Poll