It may seem like the story of Hadoop hit a dramatic climax this year as big data workloads went to public cloud giants like AWS and one of the original purveyors of the open source big data technology stack, MapR, narrowly averted shutting down when HPE acquired it.
But the real story is one of gradual evolution. Hadoop started out as a technology stack for managing big data, but in the years since the term "Hadoop" faded as the hot tech buzzword, it has become something more -- "a movement toward a modern architecture for managing and analyzing data," said Arun Murthy, chief product officer at Cloudera, and former CPO and a co-founder for Hortonworks, in a post on Medium this week titled Hadoop is Dead. Long live Hadoop.
That's what Cloudera, one of the original three Hadoop distribution companies, and the only remaining one (Hortonworks and Cloudera announced their plans to merge a year ago), has envisioned as it has created its stack of technology aimed at enterprise customers. It's not just Hadoop anymore. The plan is for a collection of open source technologies made available in the cloud to enterprise customers. That has been an evolution. All three of the original Hadoop companies, Cloudera, Hortonworks, and MapR had been moving away from marketing themselves as "Hadoop" companies for several years already. The Strata + Hadoop Conference changed its name to the Strata Data Conference in 2017. Now a new vision is emerging of large-scale data platforms made up of open source components and based in the cloud.
Since the merger, Cloudera has been working to fulfill that vision with its Cloudera Data Platform. The company delivered an initial release of the technology and reported a select group of customers were evaluating its services in a public cloud deployment. In its Q2 earnings release on September 4, Cloudera said that its pipeline for sales has improved from a disappointing Q1, and the company is on target for the year. A subsequent version of the technology expected later this year will be available for on-premises and alternative cloud deployments.
The company also announced on September 4 plans to acquire Arcadia Data, a provider of cloud-native AI-powered business intelligence and real-time analytics. (Value-based physician and hospital network leveraged Arcadia's technology for its value-based care program.) Cloudera said that it expects Arcadia's technology to help accelerate time-to-insight for customers through enhanced self-service access to data and improved analytics response times.
All these moves lead back to the idea of Hadoop as a philosophy, as laid out by Murthy in that Medium blog post. He said the philosophy is comprised of the following tenets:
- That architecture is a disaggregated software stack where each layer -- storage, compute platform, compute frameworks for batch/realtime/sql -- are built in a way that they can be snapped together like "Legos." That architecture is distinct from a database's custom storage format, parser, and execution engine.
- The philosophy also calls for leveraging commodity hardware for large-scale distributed systems, getting away from proprietary/monolithic hardware-plus-software stacks.
- A move toward leveraging open data standards and open source technology, and a move away from proprietary tech that is controlled by vendors.
- A move toward a flexible and ever-changing ecosystem of technologies, from YARN to MapReduce, to Spark/Flink, to whatever else comes next. Murthy said that this move away from monolithic stacks enables innovation to happen at every layer.
Public cloud will play a key role in the deployment of this architecture because it reflects how enterprise hardware infrastructure has become a commodity. But there's another element to the cloud deployment, too.
"The fundamental goal of CDP (Cloudera Data Platform) is to ensure that as a cloud service we make it much simpler for enterprises to derive value from the platform without dealing with the complexities of the powerful technologies," Murthy wrote.
For more on AI, machine learning, and analytics in the enterprise, read these articles: