Kimball University: Handling Arbitrary Restatements of History
How do you cope with an executive's request to "bring back a time series of activity for all subscribers who were in platinum status as of X date," or "show me a time series of orders by sales region according to the sales organization as of Y"? Here's how data warehouse pros can cope with the common requirement to look back in time.
Most business users of the data warehouse/business intelligence system are content with looking at their information in one of two ways: by the current state of affairs, or by tracking history. For example, a sales manager developing a sales forecast wants to see sales for his or her region as that region is defined today. But that same sales manager confirming the compensation plan must correctly associate all sales with each salesperson, even if they used to report into a different region. Today's best-practice BI systems are designed for these alternative approaches.
But sometimes it's more complicated than that. Do our existing design techniques enable a business user to restate history as of an arbitrary date? As we describe in this article, the answer is "yes, but." Yes we can, but it's a challenging query for business users to construct. Luckily, a modest design change in the data model can improve the situation.
Manage Changes to Dimension Attributes
The Kimball Method advocates three main techniques for managing changes to dimension attributes:
Type 1: Restate history by updating the dimension attribute in place.
Type 2: Track history by adding a new row to the dimension table that contains the new view of the dimension member.
Type 3: Snapshot history by adding new columns to the dimension table that holds the attribute's values at a specific date, often year-end.
These techniques have stood the test of time. They meet the vast majority of business users' requirements for reporting and analysis. Dimension tables are joined to the fact table by a single column key, usually an integer, that's managed by the data warehouse ETL process. This simplicity pays us back at query time when the database engine has a relatively simple task — a task for which most database vendors have invested in tuning their engines. Dimensions with Type 1 and Type 2 attributes are smoothly incorporated into OLAP cubes as well, for high-performance querying and analysis.
Sometimes, though, users need to report using dimension attributes as of an arbitrary date. For example, "Bring back a time series of activity for all subscribers who were in platinum status as of December 12," or "Give me a time series of orders by sales region according to the sales organization as of July 15."
This requirement isn't directly met by the three classic dimension attribute change types. Type 1 gives us the dimension attributes as of today. Type 3 is not sufficiently flexible; we need to modify the data model, adding columns to the dimension table to encode the attributes for the alternative history. Type 2 is our best bet; the Type 2 attributes actually contain the information we need to reconstruct the dimension as of a past date.
The Agile ArchiveWhen it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyIT’s tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.