Hadoop Heat: Cloudera, NetApp Strike At EMC

Cloudera, NetApp announce partnership that promises new flexibility for big data--and new competition for EMC's Hadoop plans.

Doug Henschen, Executive Editor, Enterprise Apps

November 7, 2011

4 Min Read
InformationWeek logo in a gray background | InformationWeek

12 Top Big Data Analytics Players

12 Top Big Data Analytics Players


12 Top Big Data Analytics Players (click image for larger view and for slideshow)

Cloudera and NetApp announced a partnership on Monday whereby the storage giant will make available Cloudera's Apache Hadoop distribution and enterprise management software, and Cloudera will support a NetApp Open Solution for Hadoop, a reference architecture for storage set for release in December.

The partnership is clearly a response to an alliance struck in May between NetApp rival EMC and MapR Technologies. As part of that deal, EMC entered the Hadoop enterprise support business in direct competition with Cloudera, and it incorporated MapR's software as part of a Greenplum HD Enterprise Edition Hadoop software distribution.

A java-based platform for distributed data processing, Hadoop has gained interest and adoption in recent years on the strength of its ability to handle the big data encountered by Internet businesses and other organizations handling hundreds of terabytes if not petabytes of information. The storage opportunity has naturally attracted storage vendors EMC, which has $17 billion in annual revenue, and NetApp, its smaller rival with $5 billion in annual revenue.

[ Want more on Hadoop and NoSQL alternatives? Read Disruptive Tech Changes IT's Database Choices. ]

Cloudera is the oldest and largest provider of enterprise support and Hadoop management software, with more than 100 customers, but it's a tiny company compared to NetApp and stands to gain a huge lift through that vendor's sales and distribution organization. As for the NetApp Open Solution for Hadoop, a reference architecture sounds like something you can hash out on a napkin, but the partners say the suggested configurations of software and hardware will speed deployment and have been tested in NetApp's labs to ensure performance.

Further, whereas the commodity servers typically used in Hadoop deployments limit flexibility of compute-capacity-to-storage ratios, Cloudera and NetApp say the Open Solution decouples storage and compute while also providing higher availability and reliability, and improved manageability for enterprise environments.

"In our approach, compute capacity can grow at the rate of the application requirements and storage can grow at the rate of the data requirements, and we think that's a huge benefit as customers start to build out their workloads," said Jeffrey O'Neal, NetApp's senior director of data center solutions.

As an example, where Hadoop nodes on pizza-box-style commodity servers often house eight drives, O'Neal said NetApp's hardware can put up to 14 2-TB drives behind a single computer node with provisions for hot spares for better reliability. RAID storage is also built in for data protection. Disk drives are configured on trays, and because there are hot spares, failed drives can be swapped out without bringing down a node and removing a server. Further, the architecture also provides a NetApp NFS (Network File System) backup protection for the named node, a single point of failure in Hadoop deployments because the named node controls all other nodes.

Pointing out a key contrast with EMC's Hadoop offering, which is intended to be run on the EMC Greenplum Modular Data Computing Appliance (DCA), O'Neal said that the NetApp Open Solution does not "force you to use a particular database." That's a reference to the fact that the DCA can also run EMC's Greenplum database (but no other databases) for conventional relational data warehousing needs.

"Cloudera has supported connectivity with a huge variety of databases, including everything from Teradata and Netezza to Oracle, MySQL, and Vertica," said Ed Albanese, Cloudera's head of business development.

With IBM having released Hadoop-based BigInsights software and support, and Oracle and Microsoft having announced their intention to add their own Hadoop distributions and support (along with a Big Data Appliance from Oracle), it's clear that this data processing platform is headed for wider use.

Likening the NetApp-Cloudera reference architecture to an appliance configuration, MapR CEO John Schroeder said in a statement that the entry of commercial vendors into the Hadoop market will help make it "a safe choice" as a big data platform. "Most organizations run Hadoop by installing software on commodity hardware where you can purchase terabyte drives for less than $100," he said. "We'll see how the market responds to Hadoop appliance offerings."

About the Author

Doug Henschen

Executive Editor, Enterprise Apps

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of Transform Magazine, and Executive Editor at DM News. He has covered IT and data-driven marketing for more than 15 years.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights