12 Hadoop Vendors To Watch In 2012
Hadoop has been called the next-generation platform for data processing because it offers low cost and the ultimate in scalability. But Hadoop is still immature and will need serious work by the community--including the 12 vendors described here--to turn this fledgling baby elephant into an industry colossus.
Hadoop is at the center of this decade's big data revolution. This Java-based framework is actually a collection of software and subprojects for distributed processing of huge volumes of data. The core approach is MapReduce, a technique used to boil down tens or even hundreds of terabytes of Internet clickstream data, log-file data, network traffic streams, or masses of text from social network feeds.
Excitement has been building around Hadoop since its release as an Apache open source project in 2008, thanks to its combination of low cost, scalability, and flexibility to handle any data without building predefined schemas. Many people see in Hadoop the potential to usher in a whole new generation of data-processing capabilities, just as Structured Query Language (SQL) ushered in a revolution in data computing more than 30 years ago.
But Hadoop is immature and, in some ways, downright crude compared to SQL. Pioneers, most of whom started working on the framework at Internet giants such as Yahoo, have already put at least six years into developing Hadoop. But success has brought mainstream demand for stability, robust administrative and management capabilities, and the kind of rich functionality available in the SQL world.
All eyes are now on Hadoop vendors, a fast-growing community, to deliver robust tools, capabilities, and innovations. Leading lights in that community include Cloudera and Amazon Web Services. Cloudera was the first and is now the largest source of Hadoop software with its CDH distribution and accompanying management software. It's also the largest provider of enterprise support and training for Hadoop. Amazon was an early mover in running Hadoop in a public cloud with its Amazon Elastic MapReduce service.
In 2011, MapR and Hortonworks, the latter a Yahoo spinoff, burst onto the scene with announcements about their own distributions of Hadoop software along with support, training services and, in MapR's case, proprietary twists aimed at delivering high performance. Competition is part of what it will take to improve Hadoop, so the availability of more distributions, and new support and training options should benefit everyone.
Data processing is one thing, but what most Hadoop users ultimately want to do is analyze the data. Enter Hadoop-specialized data access, business intelligence, and analytics vendors such as Datameer, Hadapt, and Karmasphere.
The clearest sign that Hadoop is headed mainstream is that fact that it was embraced by five major database and data management vendors in 2011, with EMC, IBM, Informatica, Microsoft, and Oracle all throwing their hats into the Hadoop ring. IBM and EMC released their own distributions last year, the latter in partnership with MapR. Microsoft and Oracle have partnered with Hortonworks and Cloudera, respectively. Both EMC and Oracle have delivered purpose-built appliances that are ready to run Hadoop. Informatica has extended its data-integration platform to support Hadoop, and it's also bringing its parsing and data-transformation code directly into the environment. Read on to learn more about what these influential vendors are doing with Hadoop.