Hadoop is open source code that is a distinct, cloud-oriented technology. It's already in heavy use at Yahoo and Amazon.com, where it runs on large server clusters. Hadoop provides a method of storing a large data set across many disks, then gathering it in a simultaneous, parallel extraction for a pass to analyze key information.
Hadoop includes a cloud-oriented function, MapReduce, which maps data across a cluster to be analyzed by processors close to where the data is located. This function is part of what reduces the time it takes to process a large data set.
Hadoop use thus far has been the province of heavy hitting computer science PhD's at major Internet companies, such as Google, as it analyzes the content of the Internet. Cloudera is trying to make Hadoop's analysis powers available to the average business analyst.
"We have built Cloudera Desktop to ease Hadoop adoption outside its birthplace," explained Mike Olson, Cloudera CEO, in an interview.
Olson is the former CEO of Sleepycat, supplier of the open source BerkeleyDB embeddable database, now owned by Oracle; he served as VP of embedded databases for two years after the acquisition. Cloudera was founded to become a company that provides technical support to Hadoop users and increases its use.
"Hadoop is a flexible data storage platform. You can do flexible analysis with it," said Jeff Hammerbacher, VP of products, in an interivew. He is the former head of the data team at Facebook, which used the massive amounts of statistics generated on the Facebook site to analyze what users did with the site and what features to produce next.
The Cloudera Desktop aids the task of putting Hadoop to work by supplying four applications. The Desktop's File Browser enables copying and browsing large data files stored on a cluster. Its job submission app, Job Designer, can be used to define a Hadoop job, run it and save it for future reuse. The Job Browser app lets a Hadoop user track the progress of an analysis job. And the Cluster Health dashboard tells the Hadoop user whether all is well with the machine cluster on which Hadoop is running; it can alert system administrators if the cluster is running into a problem.
Roughly equivalent functionality can be obtained through the use of the Apache open source code. Cloudera has moved that functionality from a command line interface to an easier-to-adopt graphical user interface. "We expect it to drive new use of Hadoop," said Hammerbacher.
All four applications run in a user's Web browser, and can run on Windows, Linux or Apple Macintosh machines.
Cloudera received $6 million in a second round of funding from Greylock Partners after receiving $5 million in first round funding. Its individual angel funders include Diane Greene, former CEO of VMware, and Marten Mickos, former CEO of MySQL AB.
InformationWeek and Dr. Dobb's have published an in-depth report on how Web application development is moving to online platforms. Download the report here (registration required).
Stay connected and informed by visiting our Enterprise IT Community!

Become a member today for instant access to free InformationWeek research, expert advice, peer perspectives, and more on the following topics:
- Application Performance Management (APM)
- Security Management
- Mainframe 2.0
- IT Automation
- Service Assurance
Also, visit our Government, Retail and Financial Services groups to see how these technologies apply specifically to those industries.
NOTE: Offer valid for U.S., U.S. possessions, & Canada only.