Big Data // Big Data Analytics
Commentary
4/3/2013
11:40 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Inside IBM's Big Data, Hadoop Moves

IBM DB2 adds in-memory analysis and compression tricks, while PureData System for Hadoop arrives as an appliance. But will IBM beat other tortoises in the Hadoop race?

13 Big Data Vendors To Watch In 2013
13 Big Data Vendors To Watch In 2013
(click image for larger view and for slideshow)

IBM is making a series of high-profile analytics announcements on Wednesday from its Almaden Research Center in San Jose, Calif., a fitting location at the epicenter of big data activity. The themes of the announcements will sound familiar because they've been the subject of announcements by plenty of competitors in recent years, but IBM contends it's setting new standards of performance.

There are two major announcements. This first is BLU Acceleration, a combination of compression, in-memory analysis and vector-processing techniques that IBM says will drive huge improvements in relational database performance. BLU is set for release in the second quarter, and it will benefit DB2 first and foremost. IBM is also bringing the technology to the Informix database with this release and, according to sources, to the Netezza database in future releases.

The second announcement is IBM PureData System for Hadoop, an appliance-based platform that customers will be able to scale up by simply adding more boxes. The hardware will run an upgraded version of IBM's InfoSphere BigInsights Hadoop distribution that's also being announced on Wednesday.

In typical IBM fashion, the announcements are loaded with bravado about the billions the company has spent acquiring software companies in recent years, but there's plenty of substance behind the buzz words.

[ Want more on big data analytics announcements? Read 6 Big Data Advances: Some Might Be Giants. ]

With BLU Acceleration, IBM is taking advantage of the same breakthroughs in low-cost memory and processing power that SAP has been talking about in connection with its Hana in-memory database. BLU is not IBM's answer to Hana, however. The focus for now is strictly on analytics and does not, as yet, address transaction processing, so it's more of a competitive response -- bar raising, IBM contends -- to the likes of Teradata, HP Vertica and EMC Greenplum, and data warehousing uses of Oracle Exadata and Oracle Exalytics.

The techniques employed by BLU include hybrid row and columnar storage, advanced compression, data skipping, vector processing and leveraging of increasingly affordable memory to speed processing. We've seen all these techniques before -- mixed columnar and row from Teradata, HP-Vertica and EMC Greenplum, data skipping from InfoBright and IBM's Netezza database, vector processing from Actian (formerly Ingress), and aggressive use of memory from multiple vendors -- but IBM is alone in putting all of these techniques together.

With BLU, IBM says it will be able to crunch 10 terabytes down to 1 terabyte; bring that 1 terabyte into memory; and effectively crunch it again down to 10 gigabytes. With the data-skipping technology, the database can then focus in on the 1 gigabyte that matters to a query without wading through repeating or irrelevant data. Your mileage may vary, as the saying goes, but IBM reports that BLU improves performance by 8X to 25X over the last DB2 release (10.1).

All these enhancements will undoubtedly reassure existing DB2 customers that the database roadmap is keeping up with state-of-the-art features. "What's novel in IBM's approach is that it's doing acceleration in several ways," analyst Robin Bloor of Bloor Group told InformationWeek. Whether it breaks new performance benchmarks and changes market share dynamics remains to be seen.

"The 25X figure is very aggressive, but when people are making purchase decisions, they do proof-of-concept benchmarks and put one database up against another," said Bloor. "You don't actually know what kind of performance you'll get until you've done the comparisons."

What IBM has yet to address with BLU, and what will likely require more extensive use of memory, is transaction processing as well as analytics. That's what SAP is doing with Hana, it's what Microsoft has announced it will do with project Hekaton (expected in 2015) and it's what Oracle is rumored to be working on for a future Oracle Database release.

"We do see an evolution of this technology beyond reporting and analytic workloads, but I can't comment on a timeframe for that," IBM's Tim Vincent, IBM fellow, VP and chief technology officer told InformationWeek. If the same pattern holds that IBM took in introducing the BLU technologies, it might wait to see what others do before attempting to do them one better.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July14, 2014
Our new survey shows growing demand, flat budgets, and CIOs looking to cloud providers -- not to offload services, but to steal ideas.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.