Cloud
Commentary
8/6/2010
12:00 PM
Charles Babcock
Charles Babcock
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

Cloudera Gives Hadoop A User Interface, Deployment Tools

Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.

Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.Cutting founded the Hadoop project while working at Yahoo. It's now an Apache open source project, with Cutting and both many of his fellow Cloudera developers and former fellow developers at Yahoo contributing to the project. Hadoop consists of MapReduce, the cluster mapping system that pulls data off a server disk then maps it to the closest available CPU, and the Hadoop Distributed File System, which partitions a large file set across a cluster. Both make use of parallel processing to tap the potential of the cluster and both can tolerate a hardware failure and route around it.

It was Hadoop, of course, that took four terabytes of scanned archives from the New York Times and converted them to PDFs for display on the Time's Web site. It accomplished the task in less than 24 hours, using 100 machines in the Amazon EC2 cloud. This was one of the incidents that started to give cloud computing a good name back in 2007.

Mike Olson, CEO of Cloudera and former head of the company behind BerkeleyDB, says the launch of Cloudera Enterprise June 29 was intended to shift Hadoop use out beyond the hands of skilled Java programmers into a broader set of users. Currently, it takes a programmer to feel comfortable with Hadoop's command line interface. With Cloudera Enterprise, a Hadoop administrator gets graphical tools to "monitor, manage and control access to a Hadoop cluster," including means to provision new servers for the cluster, accept identity management supplied by Active Directory or LDAP identity management systems, and connect Hadoop to various systems monitoring systems, Olson said in an interview.

The goal is to smooth the deployment of Hadoop to take on the task of sorting and managing the masses of data being generated on Web sites, on trading exchanges and in scientific research projects. "Managing hundreds of machines in a cluster is always a problem," Olson said, and Hadoop users need all the help they can get to make use of the growing reams of data available to them.

In effect, Cloudera Enterprise is the Cloudera distribution of Hadoop itself, a production tested version, combined with the tools and the user interface it's been able to layer on top. It has rolled other open source code used with Hadoop into the package, such as the Hadoop programming language, PIG, and the data warehouse system built on Hadoop, Hive. The announcement of Cloudera Enterprise didn't roil the waters all that much. Cloudera was expected to bring out a front end set of management tools and it's did so at the Hadoop Summit held at Yahoo June 29. New users of these tools are likely to push Hadoop forward into a larger presence in cloud computing and monumental Web data handling tasks.

A major user of Hadoop is eBay and Anil Madan, director of engineering, analytics platform development, said Cloudera Enterprise is a welcome addition to his daily task of coping with a mountain of data. "These new tools make it easy to perform critical activities including user access, authorization and lifecycle management of end user jobs," he said in the announcement.

Hadoop is available for free download from the Apache Software Foundation. It is an early stage project, still in the Apache Incubator, where project governance and initial mailing lists and methods of operation are set up. A production version of Hadoop is also distributed free by Yahoo, which makes use of the system itself.



Emerging technology always comes with a learning curve. Here are some real-world lessons about cloud computing from early adopters. Download the latest all-digital issue of InformationWeek for that story and more. (Free registration required.)

Comment  | 
Print  | 
More Insights
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.