Software // Operating Systems
News
3/16/2009
06:33 PM
Connect Directly
RSS
E-Mail
50%
50%
Repost This

Cloudera's Cloud Could Unlock Doors For Open Source

The company's first commercial version of Hadoop allows cloud-powered servers to store petabytes worth of information.

Open source gained an adult dose of technical credibility on Monday as Cloudera rolled out the first commercial version of Hadoop, a server product capable of storing petabytes worth of information.

Releasing a commercial version of Hadoop, already a proven success internally at companies like Google, Yahoo, and Facebook, was merely a matter of course, company officials said.

"After working with large Hadoop deployments at companies like Facebook, Google, and Yahoo, we came to realize that people needed Hadoop installation, configuration, and management to be much easier," said Christophe Bisciglia, Cloudera's founder and former manager of Google's Hadoop cluster. "And I think we made it easier for everyone to store and process the same types of big data that large Web companies are using in their businesses."

In order to make the Cloudera Distribution for Hadoop easier to install and use, Cloudera today also debuted a new portal, called my.cloudera.com. At this site, developers and users can use a Web-based configuration tool capable of creating packages that can be tailored to fit their specific application requirements.

Individual settings for the cluster can be saved on the portal so as to enable automatic updates.

The new distribution is made up of several constituent parts.

The Hadoop Distributed File System is fault tolerant and is built to assume that hardware failure is normal and has quick detection capabilities. The product's MapReduce feature divides applications into small segments of work that better prepares them for automatic parallelization and execution on large clusters. Hive is a data warehousing infrastructure built on top of Hadoop that gives users and developers the tools for analysis, data summary, and querying. Last is Pig, a platform for analyzing large data sets in Hadoop using a high-level language for expressing data analysis programs.

Hadoop is written in Java, which, of course, means it can run on any Java-enabled platform. However, approximately 90% of companies use it under Linux coupled with 64-bit hardware.

Available now, Cloudera's Distribution for Hadoop is free of charge and will be distributed under the Apache 2 software license. The product will be distributed as a prepackaged RPM bundle for Red Hat Linux systems or an Amazon EC2 image, company officials said.

Users and developers wanting to put Cloudera Distribution for Hadoop through its paces can download it and have their choice of running it on Linux, Windows, or Mac OS. The basic image comes with sample code and the components needed to use the product, including a master server and one node.

A preconfigured VMware image that users and developers can use in tandem with their free online training also will be made free of charge, Cloudera officials said.


The shift to delivering IT through a utility model is poised to change the business computing landscape as we know it. InformationWeek has published an independent analysis of this topic. Download the report here (registration required).

Comment  | 
Print  | 
More Insights
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.