Commentary
EMC's Hadoop Move Points To Analysis Arms Race
Planned Greenplum appliance will bridge structured and unstructured data, and it's easy to see the industry's top vendors will follow with their own all-purpose analytic platforms.Stepping up its pursuit of big-data analysis, EMC on Monday announced that it will release its own distributions of open-source Apache Hadoop distributed processing software, along with a related appliance that will analyze both structured and unstructured data on a single platform.
In a similar announcement, startup company DataStax on Monday released Brisk, a product that combines Apache Cassandra open-source software for large-scale transaction processing with a Hadoop distribution. The product provides a single platform combining a low-latency database for super high-volume Web and real-time applications with tightly coupled Hadoop analytics.
More Software Insights
Webcasts
- Entering the Scrum: Taking the First Steps on Your Agile Journey
- Techniques for Next-Gen Data Protection using Next-Gen Computing
White Papers
- Securing the iPad in the Enterprise: Balancing Compliance and Productivity Goals
- Six Ways to Improve Customer Loyalty and Drive Retention with Integrated CRM
Reports
More >>Throw SAP's well-publicized in-memory ambitions in with these new products, and a vision of the future emerges, with lots of leading IT vendors addressing mixed data-analysis on unified platforms, but more on that later.
Hadoop is quickly gaining adoption due to its ability to analyze massive volumes of unstructured information, a category that includes textual information, such as social-network comments and email messages, and machine-generated data, such as network logs, security logs, application logs and sensor data, that doesn't fit neatly into consistent columns and rows.
EMC says it will release an EMC Greenplum HD Community and Enterprise Edition distributions of Hadoop in the third quarter along with a Greenplum HD Data Computing Appliance. The latter will combine the Greenplum database and the Enterprise Edition distribution of Hadoop on a single appliance.
This isn't the first effort to analyze structured and unstructured data on a single platform, but if it's the first appliance to run a relational database and the Hadoop stack on a single hardware platform. The combination that should appeal to customers because it promises to improve performance while eliminating redundant hardware.
Hadoop Appeal
Unstructured data can't be analyzed in conventional relational databases, so organizations swamped with tens or hundreds of terabytes or more rely on Hadoop, which can spread processing across tens, hundreds, or thousands of compute nodes on commodity servers, depending on the scale of the deployment. Hadoop also provides a MapReduce engine, which helps split up workloads when handling particularly large sets of unstructured data.
To date, Hadoop deployments and conventional relational data warehouses have run on separate hardware platforms, yet companies usually need to do SQL-style analysis of the data sets that emerge from Hadoop analyses. Thus, plenty of data-integration and data-warehouse-appliance vendors have partnered with Cloudera, which has a popular Hadoop distribution and is the leading provider of enterprise-grade Hadoop services and support.
HP Vertica and Teradata, for example, integrate with Cloudera Hadoop deployments so data sets can be moved on to their platforms for further SQL analysis.
EMC Greenplum has also partnered with Cloudera, but with Monday's announcement it will effectively become a competitor by offering its own Hadoop software distributions, service and support, albeit with an emphasis on deployments on EMC appliances.
"With the amount of innovation that we see that's possible, it just makes much more sense for us to own the Hadoop distribution as part of our stack," said Luke Lonergan, a co-founder of Greenplum and chief technology officer of EMC's Data Computing Division.
Related Reading
| To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy. |
Subscribe to RSSResource Links
Related Webcasts
- Entering the Scrum: Taking the First Steps on Your Agile Journey
- Unlock the Value of Your Business Data: IBM's Integration Solution for .NET Environments
- Techniques for Next-Gen Data Protection using Next-Gen Computing
- Collaborative DevOps: Bridging the gap between development and operations with automation
- Best Practices for Improving Database Testing: Upgrades, migrations, business growth and more - ensuring you can handle the workload!
This Week's Issue
Free Print Subscription
SubscribeCurrent Healthcare Issue
- InformationWeek Healthcare CIO 25: Our second annual honor roll of the health IT leaders driving healthcare's transformation.
- EHR Unreadiness: Only a small percentage of physicians planning to apply for Meaningful Use funds have e-health record systems capable of achieving most of the requirements. .
- And much more!
- Read the Current Issue
Related Whitepapers
Featured Broadcast
Organizations must rigorously protect their data from all threats - including theft by outsiders and insiders, malicious attacks that can distort or destroy data, and inadvertent corruption or misuse by employees.Download this white paper and find out how to safeguard data and fulfill compliance mandates.
Learn More













Comments: