Cloudera Gives Hadoop A User Interface, Deployment Tools
Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.
Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.Cutting founded the Hadoop project while working at Yahoo. It's now an Apache open source project, with Cutting and both many of his fellow Cloudera developers and former fellow developers at Yahoo contributing to the project. Hadoop consists of MapReduce, the cluster mapping system that pulls data off a server disk then maps it to the closest available CPU, and the Hadoop Distributed File System, which partitions a large file set across a cluster. Both make use of parallel processing to tap the potential of the cluster and both can tolerate a hardware failure and route around it.
It was Hadoop, of course, that took four terabytes of scanned archives from the New York Times and converted them to PDFs for display on the Time's Web site. It accomplished the task in less than 24 hours, using 100 machines in the Amazon EC2 cloud. This was one of the incidents that started to give cloud computing a good name back in 2007.
Mike Olson, CEO of Cloudera and former head of the company behind BerkeleyDB, says the launch of Cloudera Enterprise June 29 was intended to shift Hadoop use out beyond the hands of skilled Java programmers into a broader set of users. Currently, it takes a programmer to feel comfortable with Hadoop's command line interface. With Cloudera Enterprise, a Hadoop administrator gets graphical tools to "monitor, manage and control access to a Hadoop cluster," including means to provision new servers for the cluster, accept identity management supplied by Active Directory or LDAP identity management systems, and connect Hadoop to various systems monitoring systems, Olson said in an interview.
The goal is to smooth the deployment of Hadoop to take on the task of sorting and managing the masses of data being generated on Web sites, on trading exchanges and in scientific research projects. "Managing hundreds of machines in a cluster is always a problem," Olson said, and Hadoop users need all the help they can get to make use of the growing reams of data available to them.
In effect, Cloudera Enterprise is the Cloudera distribution of Hadoop itself, a production tested version, combined with the tools and the user interface it's been able to layer on top. It has rolled other open source code used with Hadoop into the package, such as the Hadoop programming language, PIG, and the data warehouse system built on Hadoop, Hive.
The announcement of Cloudera Enterprise didn't roil the waters all that much. Cloudera was expected to bring out a front end set of management tools and it's did so at the Hadoop Summit held at Yahoo June 29. New users of these tools are likely to push Hadoop forward into a larger presence in cloud computing and monumental Web data handling tasks.
A major user of Hadoop is eBay and Anil Madan, director of engineering, analytics platform development, said Cloudera Enterprise is a welcome addition to his daily task of coping with a mountain of data. "These new tools make it easy to perform critical activities including user access, authorization and lifecycle management of end user jobs," he said in the announcement.
Hadoop is available for free download from the Apache Software Foundation. It is an early stage project, still in the Apache Incubator, where project governance and initial mailing lists and methods of operation are set up. A production version of Hadoop is also distributed free by Yahoo, which makes use of the system itself.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.