Inside IBM's Big Data, Hadoop Moves - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
11:40 AM
Doug Henschen
Doug Henschen
Connect Directly

Inside IBM's Big Data, Hadoop Moves

IBM DB2 adds in-memory analysis and compression tricks, while PureData System for Hadoop arrives as an appliance. But will IBM beat other tortoises in the Hadoop race?

13 Big Data Vendors To Watch In 2013
13 Big Data Vendors To Watch In 2013
(click image for larger view and for slideshow)

IBM is making a series of high-profile analytics announcements on Wednesday from its Almaden Research Center in San Jose, Calif., a fitting location at the epicenter of big data activity. The themes of the announcements will sound familiar because they've been the subject of announcements by plenty of competitors in recent years, but IBM contends it's setting new standards of performance.

There are two major announcements. This first is BLU Acceleration, a combination of compression, in-memory analysis and vector-processing techniques that IBM says will drive huge improvements in relational database performance. BLU is set for release in the second quarter, and it will benefit DB2 first and foremost. IBM is also bringing the technology to the Informix database with this release and, according to sources, to the Netezza database in future releases.

The second announcement is IBM PureData System for Hadoop, an appliance-based platform that customers will be able to scale up by simply adding more boxes. The hardware will run an upgraded version of IBM's InfoSphere BigInsights Hadoop distribution that's also being announced on Wednesday.

In typical IBM fashion, the announcements are loaded with bravado about the billions the company has spent acquiring software companies in recent years, but there's plenty of substance behind the buzz words.

[ Want more on big data analytics announcements? Read 6 Big Data Advances: Some Might Be Giants. ]

With BLU Acceleration, IBM is taking advantage of the same breakthroughs in low-cost memory and processing power that SAP has been talking about in connection with its Hana in-memory database. BLU is not IBM's answer to Hana, however. The focus for now is strictly on analytics and does not, as yet, address transaction processing, so it's more of a competitive response -- bar raising, IBM contends -- to the likes of Teradata, HP Vertica and EMC Greenplum, and data warehousing uses of Oracle Exadata and Oracle Exalytics.

The techniques employed by BLU include hybrid row and columnar storage, advanced compression, data skipping, vector processing and leveraging of increasingly affordable memory to speed processing. We've seen all these techniques before -- mixed columnar and row from Teradata, HP-Vertica and EMC Greenplum, data skipping from InfoBright and IBM's Netezza database, vector processing from Actian (formerly Ingress), and aggressive use of memory from multiple vendors -- but IBM is alone in putting all of these techniques together.

With BLU, IBM says it will be able to crunch 10 terabytes down to 1 terabyte; bring that 1 terabyte into memory; and effectively crunch it again down to 10 gigabytes. With the data-skipping technology, the database can then focus in on the 1 gigabyte that matters to a query without wading through repeating or irrelevant data. Your mileage may vary, as the saying goes, but IBM reports that BLU improves performance by 8X to 25X over the last DB2 release (10.1).

All these enhancements will undoubtedly reassure existing DB2 customers that the database roadmap is keeping up with state-of-the-art features. "What's novel in IBM's approach is that it's doing acceleration in several ways," analyst Robin Bloor of Bloor Group told InformationWeek. Whether it breaks new performance benchmarks and changes market share dynamics remains to be seen.

"The 25X figure is very aggressive, but when people are making purchase decisions, they do proof-of-concept benchmarks and put one database up against another," said Bloor. "You don't actually know what kind of performance you'll get until you've done the comparisons."

What IBM has yet to address with BLU, and what will likely require more extensive use of memory, is transaction processing as well as analytics. That's what SAP is doing with Hana, it's what Microsoft has announced it will do with project Hekaton (expected in 2015) and it's what Oracle is rumored to be working on for a future Oracle Database release.

"We do see an evolution of this technology beyond reporting and analytic workloads, but I can't comment on a timeframe for that," IBM's Tim Vincent, IBM fellow, VP and chief technology officer told InformationWeek. If the same pattern holds that IBM took in introducing the BLU technologies, it might wait to see what others do before attempting to do them one better.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
Can Cloud Revolutionize Business and Software Architecture?
Joao-Pierre S. Ruth, Senior Writer,  1/15/2021
10 IT Trends to Watch for in 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/22/2020
How CDOs Can Build Insight-Driven Organizations
Jessica Davis, Senior Editor, Enterprise Apps,  1/15/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll