Apache Hadoop has moved from a fledgling technology championed by open source advocates to a platform that has increasingly become mainstream in enterprise IT shops over the last 10 years since it was initially created.
In a keynote address on March 30 at the Strata + Hadoop conference in San Jose, Doug Cutting, Hadoop creator and chief architect at Hadoop distributor Cloudera, provided an informal State of Hadoop and Big Data address, looking back at the last 10 years and forward to what the future may hold for big data.
"It used to be different between open source and enterprise," Cutting told attendees during an address that led off the morning keynotes -- with open source "hippies" attending O'Reilly conferences while enterprise IT focused their attention and budgets elsewhere. "Now we've seen a merger of these communities -- enterprise and hacker."
Several factors combined to bring big data and Hadoop to this moment, he said, including the inexpensive hardware driven by the PC revolution and the open source community that created standards and turned these platforms into something that people could use at a very low cost.
"We had all the ingredients to really begin this change to ignite this revolution," he said. "Hadoop was the first to combine this into a single system."
The elements of that system have essentially remained the same over the past 10 years -- the HDSF storage system, YARN scheduler, and MapReduce execution engine. But over those 10 years more technologies have been introduced to improve Hadoop, including Apache Spark, which many organizations are now using instead of MapReduce.
"Technologies have developed around the [Hadoop] kernel," Cutting said. "And that's what will survive longer than the Hadoop project itself. A new family of technology has arrived, and a great example of that is Spark… Spark came out of the University at Berkeley. It didn't come out of a business. It came about because folks found it useful. We are seeing this again and again."
Cutting said that the Hadoop ecosystem will see this again and again. There's competition for the best technologies at the storage level and at the query level and in other areas, too. As these new technologies arrive, they will improve the whole.
"Hadoop's legacy is creating a new way of developing an ecosystem with collaboration," he said.
Today the hardware and software needed to run Hadoop is available at a much lower cost, and the system itself is much more scalable, Cutting said, with systems regularly scaling to tens of petabytes.
The technology is part of what is driving changes across all industries as they move to digital operations and customer service.
"Banks, insurance companies, manufacturers, retailers, and healthcare providers are adopting data technologies not at the periphery, but at the center of the business," Cutting said. "Data is becoming the fundamental driver of economic growth for the century."
Cutting provided a few predictions for Hadoop and big data in the next 10 years, too. Beyond the software stack, he said that he believes big data will get a boost from improvements to computer hardware. For instance, he said, Intel has created a new memory technology called 3D XPoint that improves memory speed.
"We've grown up with systems where the primary bottleneck was I/O," he said. "We are going to have the majority of data sets stored in memory, and that's going to change the applications that we can build."
Cutting also said that cloud computing has reached maturity, noting that Amazon Web Services (AWS) launched at around the same time as Hadoop and has gone through a similar adoption curve. More companies are now storing data in the cloud, he said.
"But the biggest change in the next 10 years is not going to be something I can predict, but will be things that you are involved in," he told attendees. "We now have a system that is in your hands. It is being created with your input. You can make a difference here. More so than ever before."