MapR Brings Spark In-Memory Analysis To Hadoop - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Software Platforms
11:33 AM
Connect Directly

MapR Brings Spark In-Memory Analysis To Hadoop

MapR adds Apache Spark to its Hadoop distribution to power machine learning plus ad hoc, graph, and streaming analysis. Databricks partners on support.

20 Great Ideas To Steal In 2014
20 Great Ideas To Steal In 2014
(Click image for larger view and slideshow.)

MapR announced Thursday that it's bringing Apache Spark software and support to its Hadoop distributions. Software and support is available immediately through all of its Hadoop distributions through a partnership with Spark backer Databricks.

Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. The software can run stand-alone in a clustered environment, but it can also run on top of Hadoop by way of the YARN resource manager introduced last year in Hadoop 2.0.

The Spark software stack was created and turned over to open source by Databricks, a commercial company that certifies related software and offers installation and ongoing management support. The stack includes the core data-processing engine, an interface to Hive for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis.

[Want more on analytics on Hadoop? Read Pivotal Subscription Points To Real Value In Big Data.]

"People are really excited about using Spark because it's a way around traditional multi-step processing on Hadoop," said Anoop Dawar, senior director of product management at MapR. "Spark provides a fast way to do iterative machine-learning and model-learning because it caches results in memory for continuous analysis."

MapR adds the Spark stack, highlighted in gray, to its list of more than 20 supported Apache open source projects.
MapR adds the Spark stack, highlighted in gray, to its list of more than 20 supported Apache open source projects.

Spark also supports interactive, ad hoc exploration of data, using Hive, for example, and streaming analysis applications such as network threat detection and fraud risk analysis. In the streaming role it's used in combination with tools such as Kafka and Flume.

Cloudera became was the first Hadoop distributor to add Spark software and support with its Cloudera Enterprise release in February. MapR will ship Spark software with its M3, M5, and M7 software distributions and offers optional Spark support. MapR will handle first-level and second-level support for software installation and day-to-day management. When higher-level expertise is required, MapR can call in Databricks domain experts, but MapR maintains case management so "it's not a cold handoff," according to Dawar.

Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators. Read our InformationWeek Elite 100 issue today.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
Charlie Babcock
Charlie Babcock,
User Rank: Author
4/11/2014 | 7:45:38 PM
The powerful Hadoop platform
Hadoop is one of those brilliantly simple platforms -- a distributed file system on top of distributed processing combined with data mapping -- on which many increasingly sophisticated systems may be built. Good description here of Spark streaming analysis; it's probably one of them.
Michael Franklin
Michael Franklin,
User Rank: Apprentice
4/14/2014 | 12:35:45 PM
Clarification of Spark's Origin
MapR's announcement is indeed an important milestone in the progress of Spark as an enterprise solution.   However I need to correct one key point in your article.  Spark, Shark, Spark Streaming, ML-lib etc were all developed at the UC Berkeley AMPLab ( and have been open source since their inception.  They are components of the Berkeley Data Analytics Stack (BDAS) which has been and continues to be developed by students and researchers in the AMPLab.   Databricks is a company that spun out of the lab and that was founded by many of the key developers of Spark.
COVID-19: Using Data to Map Infections, Hospital Beds, and More
Jessica Davis, Senior Editor, Enterprise Apps,  3/25/2020
Enterprise Guide to Robotic Process Automation
Cathleen Gagne, Managing Editor, InformationWeek,  3/23/2020
How Startup Innovation Can Help Enterprises Face COVID-19
Joao-Pierre S. Ruth, Senior Writer,  3/24/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
IT Careers: Tech Drives Constant Change
Advances in information technology and management concepts mean that IT professionals must update their skill sets, even their career goals on an almost yearly basis. In this IT Trend Report, experts share advice on how IT pros can keep up with this every-changing job market. Read it today!
Flash Poll