Software // Information Management
News
7/29/2007
10:22 PM
Connect Directly
RSS
E-Mail
100%
0%

Kimball University: Keep to the Grain in Dimensional Modeling

When developing fact tables, aggregated data is NOT the place to start. To avoid "mixed granularity" woes including bad and overlapping data, stick to rich, expressive, atomic-level data that's closely connected to the original source and collection process.

The BEEP grain illustrates why the atomic data is the place to start all designs. The atomic data is the most expressive data because it is the most precisely defined. Aggregated data, for example, store sales by product by month, can easily be derived from the atomic data, but necessarily must truncate or delete most of the dimensions of the atomic data. Aggregated data is NOT the place to start a design!

This design tip was motivated in part by an article that appeared recently in a trade magazine discussing the design of dimensional (star) schemas. The author, who otherwise writes pretty clearly on this subject, stated "a star needs to be defined by a set of business questions and [metrics are assigned] to stars based on common reporting and queries." This is terrible advice! A dimensional model that is designed around business questions and reports has no clear grain. It has lost its connection to the original data source and is hostage to the "report of the day" mentality that causes such a database to be tweaked and altered until no one can explain why a record is in the table or not.

When a dimensional design has lost its connection to the grain, it becomes vulnerable to a subtle problem, called mixed granularity, that is nearly impossible to fix. In this case, records in the same fact table may represent different physical measurement events that are not comparable or may even overlap. A simple example deviating from the BEEP grain would be a fact table containing "combo-pack" sales records in addition to the sales records of the individual items comprising the combo-pack. This is dangerous because without a careful constraint in the query tool or report, the sales of these items would be double counted. A corollary to keeping to the grain is "don't require the end user tools to correct problems with the grain."

Keeping to the grain means building physical fact tables around each atomic measurement event. These tables are the least difficult to implement and they provide the most durable and flexible foundation for addressing business questions and reports-of-the day.

Ralph kimball, founder of the Kimball Group, teaches dimensional data warehouse and ETL design through Kimball University and reviews large warehouses. He has four best-selling data warehousing books in print, including Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition, co-authored by Margy Ross. Write to him at ralph@kimballgroup.com.

Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.