Lesson 3: Compression Cuts Storage Cost
Better data compression saves on storage, and that's still important even as hardware costs per terabyte have declined. Column-store databases, such as HP Vertica, Infobright, ParAccel, and Sybase IQ, can achieve 30-to-1 or 40-to-1 compression while row-store databases, such as IBM DB2, Microsoft SQL Server, and MySQL, average 4-to-1 compression. That's because columnar data is consistent, containing all zip codes or all purchase order numbers, for example. Rows hold a mix of data, such as all the attributes associated with an individual customer--name, address, zip, purchase order number, and so on. The Aster Data and Oracle databases offer hybrid row/columnar features. Oracle's Hybrid Columnar Compression, for one, can crunch data at a 10-to-1 ratio.
Compression levels vary depending on the data, and keep in mind that column-store databases aren't always the best choice. If your queries call on many attributes, a row-store product may deliver better performance. Indeed, row-store databases are more commonly used for enterprise data warehouses handing a mix of queries whereas column-store databases more often power focused data marts. Column-store customers include digital-media measurement giant comScore, a Sybase IQ user since 1999, and fast-growing online network Interclick, which deployed ParAccel in 2009.
RECOMMENDED READING
Big Data A Big Backup Challenge
Big Data: Informatica Tackles The High-Velocity Problem
IBM Picks Hadoop To Analyze Large Data Volumes
Hadoop Big Data Startup Spins Out Of Yahoo
2 Ways Big-Data Analysis Pays Off
A Model For The Big Data Era
Machines Are Driving The Big-Data Era
NOAA CIO Tackles Big Data