Kimball University: Microsoft SQL Server Comes of Age for Data Warehousing
With new compression, partitioning and star schema optimization features, Microsoft's SQL Server 2008 is catching up with the state of the industry in data warehousing. Here's why these three capabilities are crucial for scalability and performance on any platform.
The Kimball Group offers courses and a Toolkit book that shows how to apply our data warehouse design principles and techniques specifically on the Microsoft platform, but it should be noted that we have always been and we remain vendor independent. Whether you're using the Microsoft SQL Server or another platform, Kimball Group encourages you to explore the potential of these three broadly available features in your environment.
More Software Insights
- Intelligent Management of WAS Applications: Reduce Cost, Complexity, and Errors
- Improving Customer Service in the Age of Digital Disruption
- Optimizing Customer Communications: The Key to Bank Growth
- Supply Chain Management Best Practice Guide: Three Rules for Building a Smarter Supply Chain
Speed the Queries With Database Compression
Database compression is exactly what it sounds like: compressing the data before it is stored on disk. While this sounds mundane as we've been using zip utilities to reduce file size for decades, moving compression into the database can make a significant difference in storage and performance.
The storage advantages of compression are obvious: smaller data size means you need less disk space. In SQL Server 2008, you should expect about a 50 percent reduction in storage requirements thanks to compression. This advantage passes through to the backup process, which should complete much faster with less data to backup. It's also possible to compress the backup without compressing the data in the database.
On the performance side, the DW/BI system is often constrained by disks, which are the slowest-performing component in the environment. By compressing data you reduce the amount of disk space used and therefore the number of disk reads required, in some cases by 50 percent or more. The tradeoff is that more CPU cycles are needed to decompress the data before it is delivered to the application, but as CPU speeds have increased, the penalty of this tradeoff has been reduced. As a result, compression can cut query time significantly for certain kinds of queries. Since compression happens at the data-definition level, database components, like the storage manager, can handle the compression process without changes in your BI applications.
There are many different types of compression, with the general rule being the greater the compression, the more CPU time required. Compression techniques can be applied within a row, across all rows on a page, or at the table, file, or database level. One form of row-level compression involves applying the variable length concept we already have in the VARCHAR data type to all fields where it might make a difference. For example, we might define a column like OrderDollars as Decimal(20,5) which would take 13 bytes in SQL Server 2005. The scale and precision of this might be necessary to support the few very large orders you get every year, but it leaves a lot of unused space in every row for most records in the fact table.