Cloud
Commentary
8/6/2010
12:00 PM
Charles Babcock
Charles Babcock
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Cloudera Gives Hadoop A User Interface, Deployment Tools

Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.

Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.Cutting founded the Hadoop project while working at Yahoo. It's now an Apache open source project, with Cutting and both many of his fellow Cloudera developers and former fellow developers at Yahoo contributing to the project. Hadoop consists of MapReduce, the cluster mapping system that pulls data off a server disk then maps it to the closest available CPU, and the Hadoop Distributed File System, which partitions a large file set across a cluster. Both make use of parallel processing to tap the potential of the cluster and both can tolerate a hardware failure and route around it.

It was Hadoop, of course, that took four terabytes of scanned archives from the New York Times and converted them to PDFs for display on the Time's Web site. It accomplished the task in less than 24 hours, using 100 machines in the Amazon EC2 cloud. This was one of the incidents that started to give cloud computing a good name back in 2007.

Mike Olson, CEO of Cloudera and former head of the company behind BerkeleyDB, says the launch of Cloudera Enterprise June 29 was intended to shift Hadoop use out beyond the hands of skilled Java programmers into a broader set of users. Currently, it takes a programmer to feel comfortable with Hadoop's command line interface. With Cloudera Enterprise, a Hadoop administrator gets graphical tools to "monitor, manage and control access to a Hadoop cluster," including means to provision new servers for the cluster, accept identity management supplied by Active Directory or LDAP identity management systems, and connect Hadoop to various systems monitoring systems, Olson said in an interview.

The goal is to smooth the deployment of Hadoop to take on the task of sorting and managing the masses of data being generated on Web sites, on trading exchanges and in scientific research projects. "Managing hundreds of machines in a cluster is always a problem," Olson said, and Hadoop users need all the help they can get to make use of the growing reams of data available to them.

In effect, Cloudera Enterprise is the Cloudera distribution of Hadoop itself, a production tested version, combined with the tools and the user interface it's been able to layer on top. It has rolled other open source code used with Hadoop into the package, such as the Hadoop programming language, PIG, and the data warehouse system built on Hadoop, Hive. The announcement of Cloudera Enterprise didn't roil the waters all that much. Cloudera was expected to bring out a front end set of management tools and it's did so at the Hadoop Summit held at Yahoo June 29. New users of these tools are likely to push Hadoop forward into a larger presence in cloud computing and monumental Web data handling tasks.

A major user of Hadoop is eBay and Anil Madan, director of engineering, analytics platform development, said Cloudera Enterprise is a welcome addition to his daily task of coping with a mountain of data. "These new tools make it easy to perform critical activities including user access, authorization and lifecycle management of end user jobs," he said in the announcement.

Hadoop is available for free download from the Apache Software Foundation. It is an early stage project, still in the Apache Incubator, where project governance and initial mailing lists and methods of operation are set up. A production version of Hadoop is also distributed free by Yahoo, which makes use of the system itself.



Emerging technology always comes with a learning curve. Here are some real-world lessons about cloud computing from early adopters. Download the latest all-digital issue of InformationWeek for that story and more. (Free registration required.)

Comment  | 
Print  | 
More Insights
2014 Next-Gen WAN Survey
2014 Next-Gen WAN Survey
While 68% say demand for WAN bandwidth will increase, just 15% are in the process of bringing new services or more capacity online now. For 26%, cost is the problem. Enter vendors from Aryaka to Cisco to Pertino, all looking to use cloud to transform how IT delivers wide-area connectivity.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July10, 2014
When selecting servers to support analytics, consider data center capacity, storage, and computational intensity.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.