Syncsort and Pervasive
Syncsort earlier this month announced plans for a Hadoop Edition of its DMExpress data integration software. The edition will include connectivity to the aforementioned HDFS and a plug-in for Hadoop that will enable customers to take advantage of DMExpress advanced capabilities including ascending, descending, reverse-order, and key-range-specified sorts. Better sorting can deliver 2X performance improvements in Hadoop, according to Syncsort.
In a parallel to the Informatica, SnapLogic, and Talend integrations, Syncsort says DMExpress Hadoop Edition also will provide an easy-to-use, graphical-user-interface-driven data-integration environment that addresses both conventional data warehouses and Hadoop environments. The Hadoop edition will be released later this year.
Pervasive's Hadoop product is Data Rush, a tool that optimizes concurrent, parallel processing within Hadoop. It does so by introducing data-flow parallel programming that Pervasive mastered long ago in its conventional data-integration software. Pervasive says the product delivers 4X to 9X performance improvements on MapReduce jobs, and that it's developing applications for the Hadoop-related Hive data warehouse and Pig data-flow programming language.
Concurrent processing is "really hard, but very necessary," according to Philip Kromer, CTO at Infochimps, a data supplier and Data Rush customer. "We might run eight different copies of programs across 50 machines, and it's essential to take full advantage of the processing power of each and every core."
In one application, Infochimps uses Hadoop running on Amazon's EC2 platform to extract data out of Twitter feeds. Kromer says the company has seen 2X and higher performance increases in pilot tests involving tens to hundreds of gigabytes, cutting 16-hour jobs down to four to eight hours. That makes it possible to harvest more data and serve more customers while also reducing computing costs. Once it scales up into production use at Infochimps, the time and cost savings will be even more dramatic, Kromer says.
I'm not alone in expecting plenty of other commercial software vendors to jump on the Hadoop bandwagon. "Considering all the user, vendor, and venture-capital mania for all things big-data at the moment, I'm confident the Hadoop market will grow to several billion dollars over the next several years," Forrester analyst James Kobielus says. That includes cloud and on-premises implementations and related deployment and services and support revenue.
The eBays, Facebooks, NetFlixes, and Twitters of this world are flashy examples. But what gets me excited is seeing a financial giant like JPMorgan Chase giving Hadoop a try. A choice of proven and well-supported commercial tools will only help Hadoop grow.