IBM has introduced an integrated development environment called the Data Science Experience, designed to build on the company's $300 million investment it made in the Apache Spark ecosystem in 2015.
The project, which was announced Tuesday, is an extension of IBM's commitment to what IBM Analytics product development VP Rob Thomas calls "the analytics operating system" referring to Spark. The announcement is timed to coincide with this week's Spark Summit in San Francisco.
"I've asserted that everybody would be using Spark in the future, and that seems to be coming true even faster than we thought," he told InformationWeek in an interview. Thomas said that the Data Science Experience is the first enterprise app for this analytics operating system.
"This is an experience that is optimized and built on Spark," Thomas said. "You can think of it as the first integrated development environment for real-time and high-performance analytics. We are enabling data scientists to build machine learning apps with Apache Spark and do that regardless of their skill set or their tool of choice."
[Interested in finding out more about the latest developments in machine learning? Read Salesforce Delivers Machine Learning to Microsoft Outlook.]
The technology is an extension of the open source project Jupyter Notebook, a web application that allows data scientists to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter technology is used for data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and more, according to the project's home page. Thomas said the Data Science Experience also leverages Spark and SystemML, which is the machine learning optimizer that IBM contributed to open source last year.
"When you build this type of environment on open source suddenly you can start to integrate a collaboration across a variety of different areas," he said. "That's also why we also are announcing an ecosystem initiative with a number of partners who are launching with us including H2O, RStudio, and Lightbend. Because of this open framework, we can bring a whole ecosystem of partners to this."
IBM's Data Science Experience is a cloud-based development environment that consolidates multiple open source tools including RStudio, Python, libraries from machine learning startup H2O.ai, and Notebooks, thus letting developers use familiar tools and still collaborate with other developers who may use other tools. The goal is to help developers get their applications into production faster.
The Data Science Experience also builds on IBM's current Data Scientist Workbench capabilities, which include connections to multiple data sources, and have more than 7,000 registered users.
Thomas said the Data Science Experience changes the approach to data science to make it a team sport.
"No matter what type of skill you have, whether it's R, or Python, or Scala, or SPSS, you can work in the Data Science Experience," he said. "You can collaborate and share datasets and collaborate and share models, and it doesn't require you to know the other languages."
Collaboration platforms like this one haven't existed in the past because so many tools were proprietary, Thomas said. This type of collaboration tool is new in the data science world, according to Thomas.
"We think this will really change the adoption of data science and machine learning in every enterprise," he said.
IBM first signaled its big commitment to Spark a year ago when Thomas told InformationWeek that Spark "is the future of enterprise data." In November, IBM's SystemML was accepted into the Apache Incubator.