Software // Information Management
12:19 PM
Connect Directly

EMC Tries To Unify Big Data Analytics

EMC Greenplum Modular Data Computing Appliance puts SQL and Hadoop in the same box, but is it a truly cohesive platform?

8 Big Data Deployments In Detail
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
Two separate worlds have emerged in big data analytics, but EMC announced a Greenplum appliance on Wednesday that aims to bring those two separate worlds together.

On the one hand there's structured data that fits neatly into the columns and rows of relational databases. That data has been mastered by relational databases, and even when it gets big (meaning north of about 10 terabytes), there are options such as massively parallel processing supported by products such as EMC's Greenplum database.

On the other hand there's the array of semi-structured, unstructured, and inconsistent data types like server log files, sensor data, social-network comments, and other forms of text-centric information. For that world the Hadoop open-source project has emerged as the leading platform for making such information computable. (Hadoop also handles highly structured data, but mostly as a high-capacity, low-cost data store.)

[Want more on big data deployments? Check out this image gallery on 10 Lessons Learned By Big Data Pioneers.]

With Wednesday's release of the EMC Greenplum Modular Data Computing Appliance (DCA), EMC says it has unified these heretofore separate domains. It's a follow up to the company's announcement last May of Greenplum HD Community and Enterprise distributions of Hadoop software and a promise to deliver a Hadoop appliance.

Greenplum's Community edition includes Hadoop MapReduce, the HDFS distributed file system, the Apache Hive query tool, the HBase column-oriented data store, and ZooKeeper tool for configuring clusters. The Enterprise edition adds proprietary features for snapshotting and replication of Hadoop clusters as well as system management capabilities.

The Modular DCA is one box that can support multiple quarter-rack deployments that can be mixed, matched, and scaled. You can start with a standard Greenplum Database Module for scalable SQL analysis and add a quarter-rack Greenplum HD module for running EMC's Hadoop release.

Other quarter-rack options include the Greenplum Database High Capacity Module, which combines more storage and less compute capacity than a standard module for high-scale, long-term archival storage at a lower cost per terabyte. There's also a Greenplum Data Integration Accelerator (DIA) module designed to host partner applications, like predictive analytics capabilities from SAS, data-integration software from Informatica, and other options said to be in review.

EMC's modular approach lets you scale standard SQL, Hadoop, archival, or analytic application capacity in quarter-rack increments up to a total of six full racks. EMC says its approach will not only save money by eliminating the need for separate hardware platforms, it will also speed insight and minimize storage demands by streaming Hadoop analyses directly into the Greenplum database. In this approach, data doesn't have to be created and stored in one environment and then copied and moved into another.

EMC used the words "coprocessing" and "marriage" to describe the blend of SQL and Hadoop within the modular appliance, but it's not quite that harmonious just yet.

1 of 2
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 20, 2014
CIOs need people who know the ins and outs of cloud software stacks and security, and, most of all, can break through cultural resistance.
Flash Poll
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.