Hortonworks Certifies Spark On YARN, Hadoop - InformationWeek
Data Management // Big Data Analytics
02:46 PM
Connect Directly
Building Security for the IoT
Nov 09, 2017
In this webcast, experts discuss the most effective approaches to securing Internet-enabled system ...Read More>>

Hortonworks Certifies Spark On YARN, Hadoop

Hortonworks catches up to Cloudera with YARN-managed implementation of Spark in-memory framework for machine learning on Hadoop.

Hadoop Jobs: 9 Ways To Get Hired
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)

Hortonworks announced Thursday that Apache Spark, a technology quickly gaining interest for in-memory-accelerated machine learning and other analyses on high-scale data, has been certified to run on Apache YARN, the resource- management layer introduced last year with Apache Hadoop 2.0.

With this milestone, Spark is ready to run as a technology preview on the Hortonworks Data Platform (HDP), which is Hortonworks' Hadoop software distribution. A production-certified release is expected by this fall.

This is not the first appearance of Spark on Hadoop. In February, Cloudera introduced support for Spark using its commercial Cloudera Manager software to deploy, manage, and monitor the software. MapR introduced its own Spark deployment in April. Hortonworks stressed that its approach is 100% open source, using YARN (yet Another Resource Negotiator) to manage and monitor Spark components and workloads alongside other systems and analyses running on Hadoop.

[Want more on Apache Spark? Read MapR Brings Spark In-Memory Analysis To Hadoop.]

"Spark is now natively integrated into Hadoop, so its resources -- CPU, memory, and so on -- can be managed along with the other workloads running on a Hadoop cluster," explained Shaun Connolly, Hortonworks' VP corporate strategy, in an interview with InformationWeek. "That's important to get right because Spark is memory- and CPU-intensive, and you don't want to have to have siloed clusters dedicated to running those workloads."

The whole point of Hadoop 2.0 and YARN is to be able to run multiple workloads -- including Accumulo, Hive, MapReduce, Pig, Storm, Solr, and now, Spark -- against the same data sets, Connolly added.

Asked for comment on Hortonworks' announcement, Cloudera sent InformationWeek the following statement:

"Cloudera developers were the key drivers on YARN support for Spark, leveraging our expertise in YARN as well our developer group on Spark. Cloudera Manager is not orthogonal to YARN support and in fact, Cloudera Manager supports Spark on YARN. Additionally, almost all our customer deployments of Spark today are on top of the YARN framework and we have many customers who are running Spark through us."

Concurrent with Hortonwork's announcement, Spark developer and support provider Databricks announced that Hortonworks is an inaugural member of its Certified Spark Distribution program.

"We're committed to ensuring all Spark users have a terrific experience -- and we're thrilled that Hortonworks shares this vision," said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement. "With the designation of Apache Spark as YARN Ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications."

Customers are now free to download and install the HDP 2.1 Tech Preview Component of Apache Spark on the current HDP 2.0 distribution. Hortonworks expects the HDP 2.1 release, which will include Spark, to be certified for production use "within a handful of months," said Connolly. Hortonworks will support Spark along with the other software included in the distribution.

InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
7/4/2014 | 1:49:35 PM
Re: That's nice, but ...
I am from Cloudera and have committed about 50 patches to Spark. Same goes for a few other people here. What are you looking at?
User Rank: Apprentice
7/4/2014 | 1:34:53 PM
Re: That's nice, but ...
None of the three 'race horses' (MapR, Cloudera, Hortonworks) seem to have contributed to Spark development.

UCB and Databricks (Spark is their main focus) seem to have the most commiters. 



Charlie Babcock
Charlie Babcock,
User Rank: Author
6/30/2014 | 9:28:57 PM
Spark keeps Hadoop competitive
It looks like there is a healthy competition between these companies that will do much to keep their respective Hadoops systems competitive. MapR Spark, Hortonworks Spark on Yarn and Cloudera Manager's support for Spark are pushing the boundaries of big data.
User Rank: Apprentice
6/26/2014 | 6:23:02 PM
That's nice, but ...
... would be nicer if more than 0 people from HortonWorks made any contribution to Spark. Or you could actually run Spark in production with HDP.
D. Henschen
D. Henschen,
User Rank: Author
6/26/2014 | 4:25:41 PM
Another case of commercial management tool versus open-source management tool
For more on Cloudera's options for implementing Spark, incuding on YARN, click here. Given Cloudera's use of YARN, the key difference between Cloudera and Hortonworks use of Spark seems to boil down to the management software used for deploying, monitoring, and managing the software (YARN does the workloads). In Cloudera's case it's commerical Cloudera Manager software. In Hortonworks' case it's open source Ambari software, but Ambari support is part of what Hortonworks is still working on at this point. Reading between the lines, I would expect HDP 2.1 to become generally available until this fall.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll