3 min read

Inside IBM's Big Data, Hadoop Moves

IBM DB2 adds in-memory analysis and compression tricks, while PureData System for Hadoop arrives as an appliance. But will IBM beat other tortoises in the Hadoop race?
IBM was ahead of both Oracle and Microsoft in embracing Hadoop, and it took a different path by introducing its own basic and enterprise distributions of its BigInsights Hadoop software in May 2011. Oracle and Microsoft entered the market in 2012 through partnerships with Cloudera and Hortonworks, respectively.

Now that IBM is announcing its own appliance, the PureData System for Hadoop, the in-house path will give it the advantage of offering a "100% IBM solution with our software distribution and our hardware," said Nancy Kopp, IBM's director of big data, in an interview with InformationWeek.

There will be two key differentiators from the Hadoop appliances that are either on the market (from EMC and Oracle) or in the works (from Teradata), Kopp said. "We saw that there's a key use case emerging for Hadoop as an archival system, so we've built archive capabilities right into the appliance," said Kopp. This will enable customers to offload data from warehouses for cold storage or archival compliance. The data is still active, however, so you can retrieve and restore to faster analytic databases.

[ Are you following the hot debate on the future of Hadoop? Read Will Hadoop Become Dominant Platform? ]

The second differentiator, according to Kopp, is a family of analytic accelerators starting with three: one for social data, one for text analytics and one for machine data. "The accelerators will make it easier to develop applications that take advantage of these data types," said Kopp, and she added that new accelerators will join the family in the future.

Beating the likes of Oracle and Microsoft on Hadoop is one thing. The question is now whether these giants will be the tortoises that ultimately finish ahead of the big data hares like Cloudera and MapR. Cloudera, in particular, is way out ahead in bringing Hadoop deployments to large enterprises with hundreds of deployments. By contrast, you seldom hear about BigInsights, and IBM refuses to disclose the number of customers running the software. At least one customer, MoneyGram, was set to participate in Wednesday's announcement.

IBM has addressed key Hadoop drawbacks that other distributors have addressed, including reliability and availability concerns tied to Hadoop's NameNode and the limited and slow SQL query capabilities of Apache Hive. On this last note, the upgraded BigInsights distribution announced Wednesday and set for release in the second quarter will include BigSQL, IBM's answer to SQL-on-Hadoop analysis.

EMC is set to release its remedy for Hive shortcomings with its Pivotal release later this month, but it looks like IBM will have BigSQL ahead of Cloudera's Impala, Hortonworks' Stinger and MapR's Drill initiatives.

As to the tortoise-and-hare question, Bloor says vendors that control the hardware will have advantages.

"My money would be on the boys with the iron, because they can look at the big picture, and as long as they get their pricing correct, then they're probably going to be able to a better job than vendors that are limited to software," he said.

That suggests that IBM -- as well as EMC/VMWare, HP, Intel, Oracle and no doubt others to come -- will have advantages. Which tortoise will win? We'll have to wait years to find out.

InformationWeek is conducting a survey on IT spending priorities. Take the InformationWeek 2013 IT Spending Priorities Survey today. Survey ends April 5.