Software // Information Management
News
5/21/2008
06:30 PM
Connect Directly
RSS
E-Mail
50%
50%

Reinvent The Data Warehouse With Column-Store Databases And Appliances

These approaches beat out conventional databases in price and performance. Here's how to find products that fit with your company's data analysis needs.

QUERIES BY COLUMN

Unlike a conventional database that stores data in rows, a column-store database looks at data vertically. In the case of a customer database, for example, the row-store database queries across each customer record, while the column-store database can query selected columns. The row approach is great for OLTP, with lots of writes for each new customer transaction. Column-store is ideal for OLAP, with lots of reads against particular attributes. If you want to explore sales by ZIP code and product, your query will interrogate only the ZIP code and product SKU columns rather than wading through names, addresses, and other irrelevant attributes.

Another advantage of the column-store database is that it can apply optimized compression--upwards of 10-to-1 compression--because the data in each column is consistent (all names, for example). As long as your query isn't record-intensive-- that is, it doesn't involve a lot of attributes--the amount of data coming from a column-store is always less than from a conventional database, so it delivers faster performance.

Column-store databases are ideal for many applications. Telecommunications companies have billions of call data records that need to be in an enterprise data warehouse for billing and historical purposes, but columnar extracts can be placed in a data mart for marketing analysis. Big retailers, banks, and insurance companies take much the same approach to look for up-sell and cross-sell opportunities, fraud, and mortality trends, respectively.

The Internal Revenue Service launched one of the first column-store deployments in 1996, using Sybase IQ, in what began as a 2-TB warehouse. Back then, 2 TB was a very large database, says Jeff Butler, the IRS's director of research databases. The columnar database brought query times down from days to a matter of hours, "which was just a phenomenal boost in productivity," Butler says. "Today, we're at 158 TB, and the queries that once took hours now take seconds."

The IRS's analytic database stores population-level data (meaning no names or Social Security numbers) from every tax return over the last 10 years. The warehouse is used to explore filing and compliance behavior, as well as the potential impact of tax code changes.

The IRS went with a column-store database because it didn't know what data elements it would need to query from one question to the next, says Butler. With a static reporting application, you can use just about any database because you can do a lot of optimization, he says, "but that's not our situation."

Impact Assessment: Column Store Databases and Warehouse Appliances

(click image for larger view)

Previous
2 of 4
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.