Hadoop At 10: Milestones And Momentum
Hadoop, an open source framework for wrangling unstructured data and analytics, celebrated its 10th birthday in January. Here's a look at the milestones, players, and events that marked the growth of this groundbreaking technology.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/blt4f8dc0c5391176fb/64cb415bc712e5cf37c59bf6/Hadoop-at-10.png?width=700&auto=webp&quality=80&disable=upscale)
The year was 2006. Facebook was a two-year-old startup company, run by a 21-year-old in a hoodie. Some entrepreneurs had joined together to launch a new social media service called Twitter, and in December the world was still six months away from seeing the introduction of the iPhone. It was a different time.
Consumers weren't carrying around sensor-laden, camera-equipped data collection devices everywhere they went and posting every thought, emotion, and meal to social media. When companies thought about data, they thought about structured data in ERP and CRM systems, and how they could create better business intelligence reports for executives.
It was in this environment that a new technology called Hadoop was born. It started as a framework to support a search engine project called Nutch. Nutch's creators needed a way to store and process the massive amount of data collected for their search engine to use, so they created a new software framework based on inspiration gained from a couple of papers published by growing Silicon Valley upstart Google.
[Check out our interview with Hadoop creator Doug Cutting as he talks about what this birthday means for big data and the software framework. Read Hadoop At 10: Doug Cutting On Making Big Data Work.]
These two developers, Doug Cutting and Mike Cafarella, eventually joined a different company called Yahoo, which then was struggling to retain its lead in the metric of site visits against this upstart Google. At Yahoo, their work on the distributed file system and framework for parallel processing was named Hadoop, after a toy stuffed elephant that Cutting's son had. Yahoo eventually sent Hadoop to open source organization the Apache Foundation. And work continued on making this not-quite-ready-for-prime-time distributed storage and processing system more scalable.
Today, Hadoop has entered a new stage. Load improvements, as well as add-on projects, have turned the software framework into a powerful tool used in a number of big companies, including Facebook, Twitter, eBay, and Salesforce. Hadoop indeed seems like it's getting ready for prime time.
In the Forrester Wave: Big Data Hadoop Distributions, Q1 2016 report, the analyst firm said: "Enterprise Hadoop is a market that is not even 10 years old, but Forrester estimates that 100% of all large enterprises will adopt [Hadoop and related technologies such as Spark] for big data analytics within the next two years."
To celebrate Hadoop's 10-year anniversary, come with us as we look back at some of the milestones, key players, and important developments in Hadoop's history.
Are you a Hadoop user? Is it something you're considering for your enterprise? Is there an important milestone we missed? Tell us all about it in the comments section below.
Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.
Cloudera is one of three companies that provide the main commercial distributions of open source Hadoop.
The company was founded in 2008 by three engineers from Google, Yahoo, and Facebook, along with a former Oracle executive, and released its first Hadoop distribution in 2009. One of the creators of Hadoop, Doug Cutting, serves as chief architect at the Palo Alto, Calif.-based company.
Cloudera received funding from a number of Silicon Valley regulars, but chip giant Intel took a majority stake in the company in March 2014.
MapR, another of the big-three Hadoop distribution companies, is based in San Jose, Calif., and was founded in 2009 by CEO John Schroeder and CTO M.C. Srivas. The company has received a total of $170 million in funding to date from investors including Google Capital. Schroeder has said in several interviews that the company is considering an initial public offering in the near future. MapR recently hired Oracle veteran Matt Mills as its COO and president. Technology partners include Google, EMC, and Talend. The company made its Hadoop distribution available through the AWS marketplace in September 2015.
Hortonworks, another of the big-three Hadoop distributors, was spun out of Yahoo in 2011 as an independent company. Yahoo and Benchmark Capital seeded the company with $23 million in venture funding at the time of the spinoff. Eric Baldeschweiler, who had served as VP of Hadoop development at Yahoo, served as the founding CEO, although he left the company in 2013.
The Santa Clara, Calif.-based company has forged partnerships over the years with Microsoft on its Azure cloud platform and with SAP.
In December 2014, Hortonworks raised $100 million in an initial public offering.
The Forrester Wave report identified two other Hadoop distributions -- IBM BigInsights and EMC's Pivotal HD Hadoop distribution. Both are Hadoop efforts within much larger companies that serve many other markets beyond big data. Forrester identified the big three and IBM as leaders, and said Pivotal was a strong performer.
While Cloudera, Hortonworks, and MapR had all released their own distributions of Hadoop already, the Apache Software Foundation didn't release version 1.0 of the technology until January 2012. It announced the news in a blog post: "The project's latest release marks a major milestone six years in the making, and has achieved the level of stability and enterprise-readiness to earn the 1.0 designation."
The year 2012 also marked the first Strata + Hadoop World conference, now a series of events hosted around the world. Attendees interested in data science learn about new developments, product launches, case studies presented by user organizations, and more. The conference started with three events in Silicon Valley, New York, and Europe (which are now annual events), and has since expanded to include Singapore last year and Beijing this year.
While Hadoop garnered a lot of initial excitement, like most new technologies, it was not easy to use and not many people had the skills required. Plus, it was considered slow.
Enter Spark. The technology was designed to enhance the Hadoop stack. It lets developers write applications in Java, Python, Scala, or R. And it can run programs up to 100x faster than Hadoop MapReduce in memory or 10x faster on disk, according to the Apache Software Foundation.
Spark is now the most active open source project in big data, with more than 600 contributors in the last 12 months. And Spark has generated excitement among plenty of experts and technology companies pushing the development of the big data stack.
"The fact that Spark had a single programming model, and the ability to analyze all types of data from all sources of data, positioned it to have the impact in the industry that something like Linux did at the turn of the century," Rob Thomas, IBM Analytics VP of product, told InformationWeek. "Linux is an operating system for systems and computers. Spark will be the operating system around analytics and how data will be accessed."
Spark became a top-level Apache Software Foundation project in 2014.
Enterprises are interested in the technology, but it hasn't been a big priority in the past. That may shift as more organizations look to pivot to real-time digital offerings.
Last year, Gartner conducted the Hadoop Adoption Study, which included a survey of 284 Gartner Research Circle members. According to the report, only 26% of respondents said they were either deploying, piloting, or experimenting with Hadoop. Among the others, 11% said they planned to invest within 12 months, and 7% said they planned to invest within 24 months.
Today, Hadoop underlies some of the most successful digital native startups, and Uber is a popular example. Other organizations using Hadoop include AOL, Facebook, IBM, LinkedIn, and Twitter.
Hadoop is also being used to improve medical outcomes, power online dating services, and more, according to co-creator Doug Cutting, who tells the story in this video celebrating the technology's 10-year anniversary. Cutting said he believes Hadoop is in its adolescence now, and is the result of the community development effort that is open source. And the interest is real. A quick look at Google Trends shows searches for Hadoop have grown over the last 10 years as searches for Business Intelligence have dropped.
Hadoop is also being used to improve medical outcomes, power online dating services, and more, according to co-creator Doug Cutting, who tells the story in this video celebrating the technology's 10-year anniversary. Cutting said he believes Hadoop is in its adolescence now, and is the result of the community development effort that is open source. And the interest is real. A quick look at Google Trends shows searches for Hadoop have grown over the last 10 years as searches for Business Intelligence have dropped.
-
About the Author(s)
You May Also Like