IBM Bets On Apache Spark As 'The Future Of Enterprise Data'
The key problem Spark resolves is access to data across the enterprise. IBM initiatives include providing courses to train 1 million data scientists and engineers to use it.
7 Data Center Disasters You'll Never See Coming
(Click image for larger view and slideshow.)
IBM is making a major commitment to the future of Apache Spark, with a series of initiatives announced today. IBM will offer Apache Spark as a service on Bluemix; commit 3,500 researchers to work on Spark-related projects; donate IBM SystemML to the Spark ecosystem; and offer courses to train 1 million data scientists and engineers to use Spark.
The commitment to Spark is "right in the heart of what [IBM] has been doing," said Rob Thomas, VP for product development for IBM Analytics, in an interview. That database heritage hearkens back to earlier commitments to Linux, and even further back to IBM's DB2 database product, he said. But it is rare for IBM to make a technological bet such as Spark, he added.
"This is the future of enterprise data." Thomas continued. "Anyone using data will have to leverage Spark."
The key problem Spark resolves is access to data across the enterprise. A typical large corporation will have hundreds, if not thousands of data sets residing in different databases across its IT system.
A data scientist can certainly craft an algorithm to plumb the depths of any database. But "it takes a data scientist 90 days of work" to craft that algorithm, Thomas said. "Today, if you port it to another system, you are talking about another 90 days of work" to re-craft and adjust that algorithm in order to get it to work. Spark "eliminates that second 90 days." he said. A Spark-based system can seamlessly and transparently access and analyze any database, without additional development and delay.
Another virtue Spark possesses is ease of use. Developers can concentrate on building the solution, instead of building an engine from scratch.
IBM sponsored a hackathon recently during which more than 100 teams crafted new Spark-based apps in about 10 days. One team made a genomic cloud system to analyze DNA samples, another created a search engine to gauge public opinion based on sentiments perceived in text. Thomas pointed to these projects as "proof of concept" to show how quickly a competent team of two or three people complete a project using Spark.
"The weakest part of Spark is the machine learning piece," Thomas noted. To that end, IBM will make available its SystemML machine learning technology to add learning capability to Spark apps, working with partner Databricks. This is not an algorithm library, but an engine that understands algorithms, Thomas said of SystemML.
While Spark looks promising, nothing will come of it without sufficient numbers of data scientists who actually use it. And data scientists don't grow on trees. IBM wants to educate about 1 million new users through a series of partnerships with AMPLab, DataCamp, MetiStream, Galvanize, and the Big Data University MOOC. The goal here is to make available a "data scientist's work bench" where users who know the R programming language can pick up Spark and its uses very quickly, Thomas said.
Ultimately, it falls to enterprises to make the best use of big data technology such as Spark. "Knowing the problem to solve—that will drive significant business value," Thomas said. CEOs are only beginning to understand how their data can be put to best use. Thomas offered the example of Moneyball, the 2003 book on how the Oakland Athletics sharpened their play of baseball through statistical analysis. "Data can make you think differently," Thomas said. And therein lies the quest for the advantages of insight.
William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.