The new release combines CDH4, Cloudera's latest distribution of open-source Apache Hadoop software, and Cloudera Enterprise 4.0, the vendor's own deployment, monitoring, and systems management software for Hadoop. Both components are laced with new features and performance improvements aimed at enterprise IT shops. The vendor also announced Tuesday that it has reached the 250-member mark in its partner program. That's further evidence, Cloudera asserts, that it is the most established and embraced Hadoop player in the industry.
With this week's release, Cloudera is getting its distribution of the latest Apache Hadoop software out to the market ahead of Hortonworks, which is expected to announce its first general release of an Apache Hadoop software distribution at next week's Hadoop Summit in San Jose, CA. The Hortonworks distribution was previously in technology preview mode.
Both Cloudera and Hortonworks will be distributing open source software from Apache's Hadoop 2.3 release, which includes upgrades aimed at high-availability and improved security. The release includes a hot-failover for the NameNode (metadata server) of the Hadoop Distributed File System (HDFS), which has long been a single point of failure.
[ Want more on Hadoop's NoSQL database? Read HBase: Hadoop's Next Big Data Chapter. ]
"If your NameNode fails, the backup takes over your cluster stays up and running," Omer Trajman, VP of technical solutions at Cloudera, told InformationWeek. "From what our customers are saying, that is exactly what they wanted."
The latest Apache software also improves security with new table and column permissions for the HBase NoSQL database component of Hadoop. Permissions can be used to control which users and groups have access to specific data, and a new scheduler access control lists can be used to determine which groups can administer of submit jobs.
Cloudera and Hortonworks rival MapR, which has always rejected HDFS, doesn't agree that Hadoop's high-availability problem has been solved. According to Jack Norris, MapR's VP of marketing, HDFS federation techniques introduced to support high availability in Apache Hadoop 2.3 are prone to data loss and limit scalability in the largest Hadoop deployments.
MapR pitches its distribution of Hadoop as a high-performance alternative, replacing HDFS with a derivative of the Unix-based network file system (NFS) that's highly scalable and includes high-availability features. Proof that HDFS Namenode outages are a thing of the past would take away one of MapR's selling points. Cloudera customer Opower, which offers Hadoop-powered comparative smart meter reporting for millions of utility customers, believes CDH4 (which it has yet to deploy) will solve more than the NameNode problem, according to Drew Hylbert, Opower's systems engineer.
"The NameNode [high-availability] is nice, so that's one less concern for the cluster, but for us, the biggest improvement is HBase replication support," Hylbert told InformationWeek, noting yet another high-availability enhancement.
Opower chose Cloudera over MapR last year in large part because it offered "superior experience and support" for HBase, according to Hylbert. "We're doing large amounts of high-frequency MapReduce jobs and we're writing directly into HBase, and Cloudera's HBase support and knowledge of the database seemed more impressive than MapR's," Hylbert said.
Cloudera Enterprise 4.0, the latest release of the vendor's systems-management software, is touted as a comprehensive differentiator. The upgrade reportedly eases deployment and management with a three-step high-availability configuration workflow that guides setup of the NameNode. Multi-Cluster Management lets administrators us a single instance of Cloudera Manager to control multiple clusters, and it's backwards compatible to the CDH3 distribution. The management software also has new heat map visualizations that help administrators quickly identify problem nodes within large clusters so they can take action.
Looking toward integration with mainstream systems-management tools, the vendor also has introduced a Cloudera Manager API. For now it can be used for custom integration work, but the vendor expects to introduce certified integrations with popular data center software such as IBM Tivoli, HP OpenView, and other systems.
For Cloudera customers, the management software lets the vendor's support engineers quickly check the health and performance of a customer's total Hadoop deployment landscape.
There are no authoritative stats on just how many Hadoop deployments are out there running on Cloudera's free software distribution. As for paid support, the company last official report noted a base of somewhere north of 100 customers. In short, the company's brand awareness now far exceeds its actual scale.
But with more enterprise-oriented features and more certified integrations with software partners (with Oracle being one of the company's most significant recent partnerships), it will be hard to dethrone Cloudera as leader of the fast-growing Hadoop market.