IBM Bets On Apache Spark As 'The Future Of Enterprise Data' - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management

IBM Bets On Apache Spark As 'The Future Of Enterprise Data'

The key problem Spark resolves is access to data across the enterprise. IBM initiatives include providing courses to train 1 million data scientists and engineers to use it.

7 Data Center Disasters You'll Never See Coming
7 Data Center Disasters You'll Never See Coming
(Click image for larger view and slideshow.)

IBM is making a major commitment to the future of Apache Spark, with a series of initiatives announced today. IBM will offer Apache Spark as a service on Bluemix; commit 3,500 researchers to work on Spark-related projects; donate IBM SystemML to the Spark ecosystem; and offer courses to train 1 million data scientists and engineers to use Spark.

The commitment to Spark is "right in the heart of what [IBM] has been doing," said Rob Thomas, VP for product development for IBM Analytics, in an interview. That database heritage hearkens back to earlier commitments to Linux, and even further back to IBM's DB2 database product, he said. But it is rare for IBM to make a technological bet such as Spark, he added.

"This is the future of enterprise data." Thomas continued. "Anyone using data will have to leverage Spark."

(Image: Geralt via Pixabay)

(Image: Geralt via Pixabay)

The key problem Spark resolves is access to data across the enterprise. A typical large corporation will have hundreds, if not thousands of data sets residing in different databases across its IT system.

A data scientist can certainly craft an algorithm to plumb the depths of any database. But "it takes a data scientist 90 days of work" to craft that algorithm, Thomas said. "Today, if you port it to another system, you are talking about another 90 days of work" to re-craft and adjust that algorithm in order to get it to work. Spark "eliminates that second 90 days." he said. A Spark-based system can seamlessly and transparently access and analyze any database, without additional development and delay.

[ What's in store for Hadoop? Read Will 2015 Be The 'Year Of Hadoop'?. ]

Another virtue Spark possesses is ease of use. Developers can concentrate on building the solution, instead of building an engine from scratch.

IBM sponsored a hackathon recently during which more than 100 teams crafted new Spark-based apps in about 10 days. One team made a genomic cloud system to analyze DNA samples, another created a search engine to gauge public opinion based on sentiments perceived in text. Thomas pointed to these projects as "proof of concept" to show how quickly a competent team of two or three people complete a project using Spark.

"The weakest part of Spark is the machine learning piece," Thomas noted. To that end, IBM will make available its SystemML machine learning technology to add learning capability to Spark apps, working with partner Databricks. This is not an algorithm library, but an engine that understands algorithms, Thomas said of SystemML.

While Spark looks promising, nothing will come of it without sufficient numbers of data scientists who actually use it. And data scientists don't grow on trees. IBM wants to educate about 1 million new users through a series of partnerships with AMPLab, DataCamp, MetiStream, Galvanize, and the Big Data University MOOC. The goal here is to make available a "data scientist's work bench" where users who know the R programming language can pick up Spark and its uses very quickly, Thomas said.

Ultimately, it falls to enterprises to make the best use of big data technology such as Spark. "Knowing the problem to solve—that will drive significant business value," Thomas said. CEOs are only beginning to understand how their data can be put to best use. Thomas offered the example of Moneyball, the 2003 book on how the Oakland Athletics sharpened their play of baseball through statistical analysis. "Data can make you think differently," Thomas said. And therein lies the quest for the advantages of insight.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
2020 State of DevOps Report
2020 State of DevOps Report
Download this report today to learn more about the key tools and technologies being utilized, and how organizations deal with the cultural and process changes that DevOps brings. The report also examines the barriers organizations face, as well as the rewards from DevOps including faster application delivery, higher quality products, and quicker recovery from errors in production.
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Flash Poll