Open source gained an adult dose of technical credibility on Monday as Cloudera rolled out the first commercial version of Hadoop, a server product capable of storing petabytes worth of information.
Releasing a commercial version of Hadoop, already a proven success internally at companies like Google, Yahoo, and Facebook, was merely a matter of course, company officials said.
"After working with large Hadoop deployments at companies like Facebook, Google, and Yahoo, we came to realize that people needed Hadoop installation, configuration, and management to be much easier," said Christophe Bisciglia, Cloudera's founder and former manager of Google's Hadoop cluster. "And I think we made it easier for everyone to store and process the same types of big data that large Web companies are using in their businesses."
In order to make the Cloudera Distribution for Hadoop easier to install and use, Cloudera today also debuted a new portal, called my.cloudera.com. At this site, developers and users can use a Web-based configuration tool capable of creating packages that can be tailored to fit their specific application requirements.
Individual settings for the cluster can be saved on the portal so as to enable automatic updates.
The new distribution is made up of several constituent parts.
The Hadoop Distributed File System is fault tolerant and is built to assume that hardware failure is normal and has quick detection capabilities. The product's MapReduce feature divides applications into small segments of work that better prepares them for automatic parallelization and execution on large clusters. Hive is a data warehousing infrastructure built on top of Hadoop that gives users and developers the tools for analysis, data summary, and querying. Last is Pig, a platform for analyzing large data sets in Hadoop using a high-level language for expressing data analysis programs.
Hadoop is written in Java, which, of course, means it can run on any Java-enabled platform. However, approximately 90% of companies use it under Linux coupled with 64-bit hardware.
Available now, Cloudera's Distribution for Hadoop is free of charge and will be distributed under the Apache 2 software license. The product will be distributed as a prepackaged RPM bundle for Red Hat Linux systems or an Amazon EC2 image, company officials said.
Users and developers wanting to put Cloudera Distribution for Hadoop through its paces can download it and have their choice of running it on Linux, Windows, or Mac OS. The basic image comes with sample code and the components needed to use the product, including a master server and one node.
A preconfigured VMware image that users and developers can use in tandem with their free online training also will be made free of charge, Cloudera officials said.
The shift to delivering IT through a utility model is poised to change the business computing landscape as we know it. InformationWeek has published an independent analysis of this topic. Download the report here (registration required).