QUERIES BY COLUMN
Unlike a conventional database that stores data in rows, a column-store database looks at data vertically. In the case of a customer database, for example, the row-store database queries across each customer record, while the column-store database can query selected columns. The row approach is great for OLTP, with lots of writes for each new customer transaction. Column-store is ideal for OLAP, with lots of reads against particular attributes. If you want to explore sales by ZIP code and product, your query will interrogate only the ZIP code and product SKU columns rather than wading through names, addresses, and other irrelevant attributes.
Another advantage of the column-store database is that it can apply optimized compression--upwards of 10-to-1 compression--because the data in each column is consistent (all names, for example). As long as your query isn't record-intensive-- that is, it doesn't involve a lot of attributes--the amount of data coming from a column-store is always less than from a conventional database, so it delivers faster performance.
Column-store databases are ideal for many applications. Telecommunications companies have billions of call data records that need to be in an enterprise data warehouse for billing and historical purposes, but columnar extracts can be placed in a data mart for marketing analysis. Big retailers, banks, and insurance companies take much the same approach to look for up-sell and cross-sell opportunities, fraud, and mortality trends, respectively.
The Internal Revenue Service launched one of the first column-store deployments in 1996, using Sybase IQ, in what began as a 2-TB warehouse. Back then, 2 TB was a very large database, says Jeff Butler, the IRS's director of research databases. The columnar database brought query times down from days to a matter of hours, "which was just a phenomenal boost in productivity," Butler says. "Today, we're at 158 TB, and the queries that once took hours now take seconds."
The IRS's analytic database stores population-level data (meaning no names or Social Security numbers) from every tax return over the last 10 years. The warehouse is used to explore filing and compliance behavior, as well as the potential impact of tax code changes.
The IRS went with a column-store database because it didn't know what data elements it would need to query from one question to the next, says Butler. With a static reporting application, you can use just about any database because you can do a lot of optimization, he says, "but that's not our situation."
(click image for larger view)