The term "big data" reached its peak on Google Trends back in 2015. Organizations were harvesting more data than ever before and needed to store it in a cost-effective way, which may be why searches for "Hadoop" also reached a peak on Google Trends that same year.
But by 2018, things have shifted. The three big Hadoop vendors -- Cloudera, Hortonworks, and MapR, no longer promote themselves as Hadoop providers. At the Gartner Data Analytics Summit this spring, Research VP Merv Adrian pointed out that none of those vendors even had the word Hadoop in their booth displays at the event.
Now the focus is on the analytics and machine learning aspects of data. Cloudera has changed its positioning to this: "A modern platform for machine learning and analytics, optimized for the cloud" -- a change the company made about a year and a half ago, according to Wim Stoop, senior product marketing manager at Cloudera. He spoke with InformationWeek in a recent interview.
Stoop said that until about two years ago, the big focus for the market had been on how to keep more data and more different types of data for longer periods of time. How do you store it all? But as organizations mastered that task, another challenge emerged -- now that we can store it, what do we do with it?
"Hence the focus on machine learning and analytics," he said.
These days Cloudera, Hortonworks, and MapR promote themselves as platforms for analytics, data science, and machine learning, incorporating many of the open source technologies in one place to make them easier for enterprises to consume. (MapR describes itself as a converged data platform integrating Hadoop, Spark, and Apache Drill along with other data technologies. Hortonworks describes itself as a connected data platform and solution.)
All these companies have repositioned themselves as providing much more than just open source storage technologies for big data needs. That's a move echoed by the changing focus of what enterprise organizations want to do with their data programs.
These data platform companies incorporate multiple open source technologies for storing, managing, and performing advanced analytics on big data. The companies are working to make it easier for organizations to consume these technologies, by also offering elastic cloud options for services.
For instance, at Strata Data London last week, Cloudera announced plans to expand its Altus data science platform as a service offering to the Azure cloud. The service has already been available on AWS for the past year. Altus Data Engineering for Azure simplifies and speeds ETL, data processing, and batch machine learning by reducing complexity, Cloudera said. Azure customers can also use the shared data catalog capabilities in Cloudera Altus SDX, currently in beta. Cloudera said that this is designed to preserve the business metadata and security and governance policies so they can be applied consistently across data processing and analytics workloads in the cloud.
Cloudera Altus Analytic DB, a data warehouse cloud service, will also now be available in Azure.
Cloudera has also updated its Data Science Workbench and the Cloudera Enterprise platform. The workbench update lets data scientists run and track versioned experiments and also more easily deploy models as REST APIs, according to Cloudera.
While the technology for achieving better results with data is arriving, many organizations still have work to do in terms of their own data organizations and processes, Stoop told me. Perhaps that is the next step.
"Many organizations are not yet seeing data as a strategic asset," he said. "They are treating much of it on a departmental and siloed basis. They need to change how they are working with that data… This is not something that happens overnight."