Column-store versus row-store? Structured data versus multi-structured data? Upgrades blend the best of both worlds.
10 Lessons Learned By Big Data Pioneers
(click image for larger view and for slideshow)
Data warehousing specialist Teradata announced Thursday a battery of upgrades that promise to blur boundaries between database architectures and applications.
The Teradata 14 database, due in December, will bring column-store analysis and compression capabilities to a row-store database. Aster Data 5.0, a release set for early next year, will advance that database's ability to handle a mix of structured, semi-structured, and unstructured information, and Teradata also is adding an Aster appliance built on its hardware.
Beyond these headlines, Teradata also piled on database management and automation upgrades aimed at minimizing the administrative and maintenance requirements of both databases.
Bridging the boundaries between row-store and column-store databases is a big deal, and it's something several vendors have been working on. Teradata has always been in the row-store camp along with EMC Greenplum, IBM DB2 and Netezza, Oracle Database, Microsoft SQL Server, and others. Sybase IQ was first a commercially successful column-store database, and products including HP Vertica, Infobright, and ParAccel have delivered variations on the column-store architecture.
Column-store databases have an advantage when you only need to query selected columnar attributes of data, like all the zip codes, product SKU numbers, and transactions dates in the database. That could tell you what sold where within the last month without wading through all the other data that might appear row by row, like the customer name, address, account number, and so on. Less data queried means faster results.
Column-store databases also do a great job at compression because the data in columns is consistent--all zip codes, all dates, all product SKU numbers and so on. That helps column stores achieve upwards 30-to-1 or 40-to-1 compression, depending on the data, while row-store databases, including Teradata's, max out at about 4-to-1 compression.
Teradata says it's 14 release will enable new and existing customers to mix-and-match columnar and row-based physical storage when it best suits an application. When contact center agents at telcos or branch managers at a bank try to answer questions for customers, for example, their queries usually involve only a few attributes of a total customer record. That's just one example of when a column-store approach might yield significantly faster results.
Oracle introduced a Hybrid Columnar Compression feature with Exadata in 2008 that squeezes data to a claimed ratio of 10-to-1. EMC Greenplum introduced a blending of row-store and column-store approaches with its polymorphic data storage approach in 2009. And Aster Data, which was acquired by Teradata in March for $263 million in cash, introduced hybrid row/column-store approach in 2010.
Oracle's hybrid feature does not support selective, columnar querying, so it doesn't speed querying significantly like a true column-store database. Aster does do selective querying, but it does not offer columnar compression, according to independent database analyst Curt Monash. As for EMC Greenplum and Teradata, "each offers different ways to mix column and row storage in the same table with each approach offering advantages," Monash said.
The biggest challenge for Teradata customers may be figuring out when to use a row-based versus column-based approach. "You'd be looking at data-access paths and data demographics to choose between row-store and column-store objects," said Scott Gnau, president of Teradata Labs, in an interview with InformationWeek. "But this is supportive of an enterprise data warehouse approach because it eliminates the temptation to extract certain sets of information and put it on a separate, column-based platform."
A key feature of the new columnar capability is automatic compression that chooses the best compression algorithms for each column of data and that dynamically changes the compression approach as data-access patterns change.
Gnua declined to offer data-compression claims, saying rates would vary depending on the data.
He did, however, predict that Teradata's columnar approach will outperform Oracle's Hybrid Columnar Compression.
Tackling Big Data
Teradata's announcement of the Aster Data 5.0 database and an Aster MapReduce Appliance, both planned for early next year, is a bit of a coming out party for Aster as a unit of the larger company. The database upgrades are incremental and the appliance is no surprise, but Teradata has an opportunity to put the Aster story on a bigger stage.
Teradata bought Aster to take advantage of the smaller company's innovation in blending analysis of structured data, semi-structured data, largely unstructured information, or a mix of all of the above. It does so with its SQL-MapReduce framework, which lets companies perform MapReduce processing on its SQL-based platform.