EMC enhances Greenplum database links to Hadoop while adding a ready-made computing platform powered by Cisco servers.
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
EMC reached out to big data novices and veterans alike on Thursday by announcing Hadoop-friendly upgrades to its Greenplum database and a new partnership through which Cisco will offer Hadoop-ready racks of clustered hardware.
EMC offers two different paths to Hadoop aimed at two different constituents. Greenplum HD is aimed primarily at Hadoop newbies who still do most of their data work in conventional relational databases. HD provides Apache Hadoop software and associated support, and these customers are mostly likely to run the software on the modular EMC Data Computing Appliance (DCA), which can run both Hadoop and the relational Greenplum database within a single box. The Greenplum database upgrades will appeal primarily to this crowd (more on that below).
Greenplum MR is EMC's performance-oriented Hadoop software and support package. It's aimed primarily at big data veterans who want more out of Hadoop and who are less likely to use the EMC DCA. These pros want pure Hadoop clusters, and that's exactly what the new Cisco offering provides.
The Cisco Unified Computing System (UCS) is more of hardware configuration than an appliance, but it's ordered as a single product, which will save customers the trouble of pulling together the bits and pieces that go into building a Hadoop cluster.
"The system includes the required servers, networking gear, and operating system, and it's tuned and optimized for our software," said Michael Maxey, senior director of product management at EMC's Greenplum division, in an interview with InformationWeek.
A single-rack Cisco UCS configuration includes 192 processor cores and supports a typical user data capacity of 320 terabytes. Multi-rack configurations add 192 processor cores and an additional 320 terabytes of capacity for each additional rack. Maxey declined to detail pricing of the hardware but he said it would be "highly competitive" with commodity deployments considering the time savings and reliability of using quickly deployable, preconfigured hardware.
Cisco UCS is ready to run Greenplum MR, which is based on the MapR's blend of Hadoop software and proprietary software. In particular, MapR uses a Unix-based replacement for the Hadoop Distributed File System (HDFS), which MapR considers to be an unreliable component of Hadoop. There are other unique software elements aimed at high-availability and performance, but MapR says its software is otherwise compatible with existing Hadoop deployments.
EMC's database upgrades are aimed primarily as customers who want to run Hadoop and the Greenplum database side by side within the Data Computing Appliance. The Greenplum Command Center management interface has been beefed up to enable administrators to stop, start, and prioritize Hadoop and conventional relational workloads across the DCA.
EMC has also expanded upon and improved integrations between Greenplum and Hadoop. The big win is a new capability to analyze Hadoop data within the Greenplum relational database without moving the data. Greenplum treats Hadoop as if it's just another database table. This promises to save lots of time and effort that would otherwise be spent shuffling data between two separate environments.
Improved support for EMC's Data Domain backup appliance will benefit Greenplum customers whether they're using Hadoop or not. By executing Data Domain deduplication capabilities inside the database, EMC said Greenplum now delivers faster and more manageable backups to Data Domain appliances.
It's time to get going on data center automation. The cloud requires automation, and it'll free resources for other priorities. Download InformationWeek's Data Center Automation special supplement now. (Free registration required.)
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.