Cloudera Gives Hadoop A User Interface, Deployment Tools
Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.
Hadoop, the system that takes advantage of a large cluster for sorting masses of data, looks like a rapidly evolving piece of cloud software. Cloudera, the company that includes original Hadoop author, Doug Cutting, issued a Hadoop front end and deployment system to make it easier to use recently.Cutting founded the Hadoop project while working at Yahoo. It's now an Apache open source project, with Cutting and both many of his fellow Cloudera developers and former fellow developers at Yahoo contributing to the project. Hadoop consists of MapReduce, the cluster mapping system that pulls data off a server disk then maps it to the closest available CPU, and the Hadoop Distributed File System, which partitions a large file set across a cluster. Both make use of parallel processing to tap the potential of the cluster and both can tolerate a hardware failure and route around it.
It was Hadoop, of course, that took four terabytes of scanned archives from the New York Times and converted them to PDFs for display on the Time's Web site. It accomplished the task in less than 24 hours, using 100 machines in the Amazon EC2 cloud. This was one of the incidents that started to give cloud computing a good name back in 2007.
Mike Olson, CEO of Cloudera and former head of the company behind BerkeleyDB, says the launch of Cloudera Enterprise June 29 was intended to shift Hadoop use out beyond the hands of skilled Java programmers into a broader set of users. Currently, it takes a programmer to feel comfortable with Hadoop's command line interface. With Cloudera Enterprise, a Hadoop administrator gets graphical tools to "monitor, manage and control access to a Hadoop cluster," including means to provision new servers for the cluster, accept identity management supplied by Active Directory or LDAP identity management systems, and connect Hadoop to various systems monitoring systems, Olson said in an interview.
The goal is to smooth the deployment of Hadoop to take on the task of sorting and managing the masses of data being generated on Web sites, on trading exchanges and in scientific research projects. "Managing hundreds of machines in a cluster is always a problem," Olson said, and Hadoop users need all the help they can get to make use of the growing reams of data available to them.
In effect, Cloudera Enterprise is the Cloudera distribution of Hadoop itself, a production tested version, combined with the tools and the user interface it's been able to layer on top. It has rolled other open source code used with Hadoop into the package, such as the Hadoop programming language, PIG, and the data warehouse system built on Hadoop, Hive.
The announcement of Cloudera Enterprise didn't roil the waters all that much. Cloudera was expected to bring out a front end set of management tools and it's did so at the Hadoop Summit held at Yahoo June 29. New users of these tools are likely to push Hadoop forward into a larger presence in cloud computing and monumental Web data handling tasks.
A major user of Hadoop is eBay and Anil Madan, director of engineering, analytics platform development, said Cloudera Enterprise is a welcome addition to his daily task of coping with a mountain of data. "These new tools make it easy to perform critical activities including user access, authorization and lifecycle management of end user jobs," he said in the announcement.
Hadoop is available for free download from the Apache Software Foundation. It is an early stage project, still in the Apache Incubator, where project governance and initial mailing lists and methods of operation are set up. A production version of Hadoop is also distributed free by Yahoo, which makes use of the system itself.
2014 Next-Gen WAN SurveyWhile 68% say demand for WAN bandwidth will increase, just 15% are in the process of bringing new services or more capacity online now. For 26%, cost is the problem. Enter vendors from Aryaka to Cisco to Pertino, all looking to use cloud to transform how IT delivers wide-area connectivity.
Server Market SplitsvilleJust because the server market's in the doldrums doesn't mean innovation has ceased. Far from it -- server technology is enjoying the biggest renaissance since the dawn of x86 systems. But the primary driver is now service providers, not enterprises.
InformationWeek Must Reads Oct. 21, 2014InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.