Hadoop is open source code that is a distinct, cloud-oriented technology. It's already in heavy use at Yahoo and Amazon.com, where it runs on large server clusters. Hadoop provides a method of storing a large data set across many disks, then gathering it in a simultaneous, parallel extraction for a pass to analyze key information.
Its power lies in the fact it can do so much faster than standard database operations allow. On a terabyte of data, Hadoop can produce returns in seconds or minutes, as opposed to the many hours of processing that might occur inside a data warehouse.
Hadoop includes a cloud-oriented function, MapReduce, which maps data across a cluster to be analyzed by processors close to where the data is located. This function is part of what reduces the time it takes to process a large data set.
Hadoop use thus far has been the province of heavy hitting computer science PhD's at major Internet companies, such as Google, as it analyzes the content of the Internet. Cloudera is trying to make Hadoop's analysis powers available to the average business analyst.
"We have built Cloudera Desktop to ease Hadoop adoption outside its birthplace," explained Mike Olson, Cloudera CEO, in an interview.
Olson is the former CEO of Sleepycat, supplier of the open source BerkeleyDB embeddable database, now owned by Oracle; he served as VP of embedded databases for two years after the acquisition. Cloudera was founded to become a company that provides technical support to Hadoop users and increases its use.
"Hadoop is a flexible data storage platform. You can do flexible analysis with it," said Jeff Hammerbacher, VP of products, in an interivew. He is the former head of the data team at Facebook, which used the massive amounts of statistics generated on the Facebook site to analyze what users did with the site and what features to produce next.
The Cloudera Desktop aids the task of putting Hadoop to work by supplying four applications. The Desktop's File Browser enables copying and browsing large data files stored on a cluster. Its job submission app, Job Designer, can be used to define a Hadoop job, run it and save it for future reuse. The Job Browser app lets a Hadoop user track the progress of an analysis job. And the Cluster Health dashboard tells the Hadoop user whether all is well with the machine cluster on which Hadoop is running; it can alert system administrators if the cluster is running into a problem.
Roughly equivalent functionality can be obtained through the use of the Apache open source code. Cloudera has moved that functionality from a command line interface to an easier-to-adopt graphical user interface. "We expect it to drive new use of Hadoop," said Hammerbacher.
All four applications run in a user's Web browser, and can run on Windows, Linux or Apple Macintosh machines.
Cloudera received $6 million in a second round of funding from Greylock Partners after receiving $5 million in first round funding. Its individual angel funders include Diane Greene, former CEO of VMware, and Marten Mickos, former CEO of MySQL AB.
InformationWeek and Dr. Dobb's have published an in-depth report on how Web application development is moving to online platforms. Download the report here (registration required).