Hortonworks catches up to Cloudera with YARN-managed implementation of Spark in-memory framework for machine learning on Hadoop.
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)
Hortonworks announced Thursday that Apache Spark, a technology quickly gaining interest for in-memory-accelerated machine learning and other analyses on high-scale data, has been certified to run on Apache YARN, the resource- management layer introduced last year with Apache Hadoop 2.0.
With this milestone, Spark is ready to run as a technology preview on the Hortonworks Data Platform (HDP), which is Hortonworks' Hadoop software distribution. A production-certified release is expected by this fall.
This is not the first appearance of Spark on Hadoop. In February, Cloudera introduced support for Spark using its commercial Cloudera Manager software to deploy, manage, and monitor the software. MapR introduced its own Spark deployment in April. Hortonworks stressed that its approach is 100% open source, using YARN (yet Another Resource Negotiator) to manage and monitor Spark components and workloads alongside other systems and analyses running on Hadoop.
"Spark is now natively integrated into Hadoop, so its resources -- CPU, memory, and so on -- can be managed along with the other workloads running on a Hadoop cluster," explained Shaun Connolly, Hortonworks' VP corporate strategy, in an interview with InformationWeek. "That's important to get right because Spark is memory- and CPU-intensive, and you don't want to have to have siloed clusters dedicated to running those workloads."
The whole point of Hadoop 2.0 and YARN is to be able to run multiple workloads -- including Accumulo, Hive, MapReduce, Pig, Storm, Solr, and now, Spark -- against the same data sets, Connolly added.
Asked for comment on Hortonworks' announcement, Cloudera sent InformationWeek the following statement:
"Cloudera developers were the key drivers on YARN support for Spark, leveraging our expertise in YARN as well our developer group on Spark. Cloudera Manager is not orthogonal to YARN support and in fact, Cloudera Manager supports Spark on YARN. Additionally, almost all our customer deployments of Spark today are on top of the YARN framework and we have many customers who are running Spark through us."
Concurrent with Hortonwork's announcement, Spark developer and support provider Databricks announced that Hortonworks is an inaugural member of its Certified Spark Distribution program.
"We're committed to ensuring all Spark users have a terrific experience -- and we're thrilled that Hortonworks shares this vision," said Databricks business development executive Arsalan Tavakoli-Shiraji in a statement. "With the designation of Apache Spark as YARN Ready, enterprises can rest assured that Spark can run simultaneously and effectively with other mission-critical applications."
Customers are now free to download and install the HDP 2.1 Tech Preview Component of Apache Spark on the current HDP 2.0 distribution. Hortonworks expects the HDP 2.1 release, which will include Spark, to be certified for production use "within a handful of months," said Connolly. Hortonworks will support Spark along with the other software included in the distribution.
InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.