Software // Information Management
04:57 PM
Doug Henschen
Doug Henschen
Connect Directly

4 Hadoop Helpers Promise Speedy Big-Data Analysis

Integration vendors launch commercial add-on products for the hot open-source framework. Here's how four products streamline high-volume workloads.

Apache Hadoop is one of the fastest-growing open-source projects going, so it's no surprise that commercial vendors are looking for a piece of the action.

Witness a spate of recent announcements from well-known data-integration vendors including Informatica, Pervasive Software, SnapLogic, and Syncsort, all of which are aimed at making it faster and easier to work with a very young big-data processing platform.

To recap, Hadoop is a collection of distributed data-processing components for analyzing large volumes of unstructured data, such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs. Relational databases, such as IBM DB2, Oracle, Microsoft SQL Server, and MySQL can't handle this data because it doesn't fit neatly into columns and rows.

Even if these commercial databases could do the job, the cost of the licenses would be prohibitive because due to the scale of the data. We're generally talking about hundreds of terabytes, and into the petabytes.

As an open-source project, Hadoop software distributions can be downloaded for free, and the software is designed to scale out on low-cost commodity servers. There aren't legions of companies that need Hadoop, but the capabilities and economies have attracted outfits including AOL, eHarmony, eBay, Facebook, JP Morgan Chase, LinkedIN, Netflix, The New York Times, and Twitter.

Hadoop is getting to be a magnet for commercial vendors. Cloudera offers a popular distribution of Hadoop and it's the leading provider of enterprise support and services. Datameer offers supporting data-integration, storage, analytics and visualization software, and Karmasphere adds a graphical environment for development, debugging and monitoring Hadoop jobs.

EMC announced Monday that it will offer its own distributions of Hadoop software, one open-source and a commercial enterprise edition including proprietary components. As I covered in my last column, EMC also announced an appliance capable of running the EMC Greenplum relational database and Hadoop on a single hardware platform.

Informatica and SnapLogic

Data-integration vendors Informatica and SnapLogic both announced partner announcements with EMC this week. Informatica says it will integrate its data-integration platform with the EMC Hadoop distributions, which are set for release in the third quarter. Informatica previously partnered with Cloudera on a similar integration.

Informatica is the largest independent data-integration vendor out there, with more than 4,200 customer firms, so EMC and Cloudera need Informatica every bit as much as Informatica wants big-data-crunching Hadoop users.

SnapLogic announced SnapReduce, a module for the SnapLogic platform that will pipe data into MapReduce, the core Hadoop data-filtering algorithm. SnapLogic will also introduce its own version of the Hadoop Distributed File System (HDFS); that will let Hadoop users pull data from the many sources handled by the SnapLogic platform and to go the other way, too. Both products are expected in the second half of this year.

I've previously reported on Hadoop-supporting tools from Talend, an open-source data-integration vendor, and from Quest Software. Most integration partnerships are aimed at making it easier to get data into and out of Hadoop. In the case of Syncsort and Pervasive, commercial add-on products are aimed at speeding processing within Hadoop.

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of August 21, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.