Cloudera, NetApp announce partnership that promises new flexibility for big data--and new competition for EMC's Hadoop plans.
12 Top Big Data Analytics Players
(click image for larger view and for slideshow)
Cloudera and NetApp announced a partnership on Monday whereby the storage giant will make available Cloudera's Apache Hadoop distribution and enterprise management software, and Cloudera will support a NetApp Open Solution for Hadoop, a reference architecture for storage set for release in December.
The partnership is clearly a response to an alliance struck in May between NetApp rival EMC and MapR Technologies. As part of that deal, EMC entered the Hadoop enterprise support business in direct competition with Cloudera, and it incorporated MapR's software as part of a Greenplum HD Enterprise Edition Hadoop software distribution.
A java-based platform for distributed data processing, Hadoop has gained interest and adoption in recent years on the strength of its ability to handle the big data encountered by Internet businesses and other organizations handling hundreds of terabytes if not petabytes of information. The storage opportunity has naturally attracted storage vendors EMC, which has $17 billion in annual revenue, and NetApp, its smaller rival with $5 billion in annual revenue.
Cloudera is the oldest and largest provider of enterprise support and Hadoop management software, with more than 100 customers, but it's a tiny company compared to NetApp and stands to gain a huge lift through that vendor's sales and distribution organization. As for the NetApp Open Solution for Hadoop, a reference architecture sounds like something you can hash out on a napkin, but the partners say the suggested configurations of software and hardware will speed deployment and have been tested in NetApp's labs to ensure performance.
Further, whereas the commodity servers typically used in Hadoop deployments limit flexibility of compute-capacity-to-storage ratios, Cloudera and NetApp say the Open Solution decouples storage and compute while also providing higher availability and reliability, and improved manageability for enterprise environments.
"In our approach, compute capacity can grow at the rate of the application requirements and storage can grow at the rate of the data requirements, and we think that's a huge benefit as customers start to build out their workloads," said Jeffrey O'Neal, NetApp's senior director of data center solutions.
As an example, where Hadoop nodes on pizza-box-style commodity servers often house eight drives, O'Neal said NetApp's hardware can put up to 14 2-TB drives behind a single computer node with provisions for hot spares for better reliability. RAID storage is also built in for data protection. Disk drives are configured on trays, and because there are hot spares, failed drives can be swapped out without bringing down a node and removing a server. Further, the architecture also provides a NetApp NFS (Network File System) backup protection for the named node, a single point of failure in Hadoop deployments because the named node controls all other nodes.
Pointing out a key contrast with EMC's Hadoop offering, which is intended to be run on the EMC Greenplum Modular Data Computing Appliance (DCA), O'Neal said that the NetApp Open Solution does not "force you to use a particular database." That's a reference to the fact that the DCA can also run EMC's Greenplum database (but no other databases) for conventional relational data warehousing needs.
"Cloudera has supported connectivity with a huge variety of databases, including everything from Teradata and Netezza to Oracle, MySQL, and Vertica," said Ed Albanese, Cloudera's head of business development.
With IBM having released Hadoop-based BigInsights software and support, and Oracle and Microsoft having announced their intention to add their own Hadoop distributions and support (along with a Big Data Appliance from Oracle), it's clear that this data processing platform is headed for wider use.
Likening the NetApp-Cloudera reference architecture to an appliance configuration, MapR CEO John Schroeder said in a statement that the entry of commercial vendors into the Hadoop market will help make it "a safe choice" as a big data platform. "Most organizations run Hadoop by installing software on commodity hardware where you can purchase terabyte drives for less than $100," he said. "We'll see how the market responds to Hadoop appliance offerings."
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
InformationWeek Must Reads Oct. 21, 2014InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.