Software // Information Management
Commentary
5/10/2011
03:53 PM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

EMC's Hadoop Move Points To Analysis Arms Race

Planned Greenplum appliance will bridge structured and unstructured data, and it's easy to see the industry's top vendors will follow with their own all-purpose analytic platforms.

Stepping up its pursuit of big-data analysis, EMC on Monday announced that it will release its own distributions of open-source Apache Hadoop distributed processing software, along with a related appliance that will analyze both structured and unstructured data on a single platform.

In a similar announcement, startup company DataStax on Monday released Brisk, a product that combines Apache Cassandra open-source software for large-scale transaction processing with a Hadoop distribution. The product provides a single platform combining a low-latency database for super high-volume Web and real-time applications with tightly coupled Hadoop analytics.

Throw SAP's well-publicized in-memory ambitions in with these new products, and a vision of the future emerges, with lots of leading IT vendors addressing mixed data-analysis on unified platforms, but more on that later.

Hadoop is quickly gaining adoption due to its ability to analyze massive volumes of unstructured information, a category that includes textual information, such as social-network comments and email messages, and machine-generated data, such as network logs, security logs, application logs and sensor data, that doesn't fit neatly into consistent columns and rows.

EMC says it will release an EMC Greenplum HD Community and Enterprise Edition distributions of Hadoop in the third quarter along with a Greenplum HD Data Computing Appliance. The latter will combine the Greenplum database and the Enterprise Edition distribution of Hadoop on a single appliance.

This isn't the first effort to analyze structured and unstructured data on a single platform, but if it's the first appliance to run a relational database and the Hadoop stack on a single hardware platform. The combination that should appeal to customers because it promises to improve performance while eliminating redundant hardware.

Hadoop Appeal

Unstructured data can't be analyzed in conventional relational databases, so organizations swamped with tens or hundreds of terabytes or more rely on Hadoop, which can spread processing across tens, hundreds, or thousands of compute nodes on commodity servers, depending on the scale of the deployment. Hadoop also provides a MapReduce engine, which helps split up workloads when handling particularly large sets of unstructured data.

To date, Hadoop deployments and conventional relational data warehouses have run on separate hardware platforms, yet companies usually need to do SQL-style analysis of the data sets that emerge from Hadoop analyses. Thus, plenty of data-integration and data-warehouse-appliance vendors have partnered with Cloudera, which has a popular Hadoop distribution and is the leading provider of enterprise-grade Hadoop services and support.

HP Vertica and Teradata, for example, integrate with Cloudera Hadoop deployments so data sets can be moved on to their platforms for further SQL analysis.

EMC Greenplum has also partnered with Cloudera, but with Monday's announcement it will effectively become a competitor by offering its own Hadoop software distributions, service and support, albeit with an emphasis on deployments on EMC appliances.

"With the amount of innovation that we see that's possible, it just makes much more sense for us to own the Hadoop distribution as part of our stack," said Luke Lonergan, a co-founder of Greenplum and chief technology officer of EMC's Data Computing Division.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - September 10, 2014
A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? When it comes to big data, one size doesn't fit all. Here's how to decide.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.