IBM DB2 adds in-memory analysis and compression tricks, while PureData System for Hadoop arrives as an appliance. But will IBM beat other tortoises in the Hadoop race?
13 Big Data Vendors To Watch In 2013
(click image for larger view and for slideshow)
IBM is making a series of high-profile analytics announcements on Wednesday from its Almaden Research Center in San Jose, Calif., a fitting location at the epicenter of big data activity. The themes of the announcements will sound familiar because they've been the subject of announcements by plenty of competitors in recent years, but IBM contends it's setting new standards of performance.
There are two major announcements. This first is BLU Acceleration, a combination of compression, in-memory analysis and vector-processing techniques that IBM says will drive huge improvements in relational database performance. BLU is set for release in the second quarter, and it will benefit DB2 first and foremost. IBM is also bringing the technology to the Informix database with this release and, according to sources, to the Netezza database in future releases.
The second announcement is IBM PureData System for Hadoop, an appliance-based platform that customers will be able to scale up by simply adding more boxes. The hardware will run an upgraded version of IBM's InfoSphere BigInsights Hadoop distribution that's also being announced on Wednesday.
In typical IBM fashion, the announcements are loaded with bravado about the billions the company has spent acquiring software companies in recent years, but there's plenty of substance behind the buzz words.
With BLU Acceleration, IBM is taking advantage of the same breakthroughs in low-cost memory and processing power that SAP has been talking about in connection with its Hana in-memory database. BLU is not IBM's answer to Hana, however. The focus for now is strictly on analytics and does not, as yet, address transaction processing, so it's more of a competitive response -- bar raising, IBM contends -- to the likes of Teradata, HP Vertica and EMC Greenplum, and data warehousing uses of Oracle Exadata and Oracle Exalytics.
The techniques employed by BLU include hybrid row and columnar storage, advanced compression, data skipping, vector processing and leveraging of increasingly affordable memory to speed processing. We've seen all these techniques before -- mixed columnar and row from Teradata, HP-Vertica and EMC Greenplum, data skipping from InfoBright and IBM's Netezza database, vector processing from Actian (formerly Ingress), and aggressive use of memory from multiple vendors -- but IBM is alone in putting all of these techniques together.
With BLU, IBM says it will be able to crunch 10 terabytes down to 1 terabyte; bring that 1 terabyte into memory; and effectively crunch it again down to 10 gigabytes. With the data-skipping technology, the database can then focus in on the 1 gigabyte that matters to a query without wading through repeating or irrelevant data. Your mileage may vary, as the saying goes, but IBM reports that BLU improves performance by 8X to 25X over the last DB2 release (10.1).
All these enhancements will undoubtedly reassure existing DB2 customers that the database roadmap is keeping up with state-of-the-art features. "What's novel in IBM's approach is that it's doing acceleration in several ways," analyst Robin Bloor of Bloor Group told InformationWeek. Whether it breaks new performance benchmarks and changes market share dynamics remains to be seen.
"The 25X figure is very aggressive, but when people are making purchase decisions, they do proof-of-concept benchmarks and put one database up against another," said Bloor. "You don't actually know what kind of performance you'll get until you've done the comparisons."
What IBM has yet to address with BLU, and what will likely require more extensive use of memory, is transaction processing as well as analytics. That's what SAP is doing with Hana, it's what Microsoft has announced it will do with project Hekaton (expected in 2015) and it's what Oracle is rumored to be working on for a future Oracle Database release.
"We do see an evolution of this technology beyond reporting and analytic workloads, but I can't comment on a timeframe for that," IBM's Tim Vincent, IBM fellow, VP and chief technology officer told InformationWeek. If the same pattern holds that IBM took in introducing the BLU technologies, it might wait to see what others do before attempting to do them one better.
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.