Slowly Changing Dimensions Are Not Always as Easy as 1, 2, 3
How do you deal with changing dimensions? Hybrid approaches fill gaps left by the three fundamental techniques
Series of Type 3 Attributes
Say you have a dimension attribute that changes with a predictable rhythm, such as annually, and the business needs to summarize facts based on any historical value of the attribute (not just the historically accurate and current, as we've primarily been discussing). For example, imagine the product line is recategorized at the start of every fiscal year and the business wants to look at multiple years of historical product sales based on the category assignment for the current year or any prior year.
This situation is best handled with a series of type 3 dimension attributes. On every dimension row, have a "current" category attribute that can be overwritten, as well as attributes for each annual designation, such as "2004 category" and "2003 category." You can then group historical facts based on any annual categorization.
This seemingly straightforward technique isn't appropriate for the unpredictable changes we described earlier. Customer attributes evolve uniquely. You can't add a series of type 3 attributes to track the prior attribute values ("prior-1," "prior-2" and so on) for unpredictable changes, because each attribute would be associated with a unique point in time for nearly every row in the dimension table.
Balance Power against Ease of Use
Before using hybrid techniques to support sophisticated change tracking, remember to maintain the equilibrium between flexibility and complexity. Users' questions and answer sets will vary depending on which dimension attributes are used for constraining or grouping. Given the potential for error or misinterpretation, hide the complexity (and associated capabilities) from infrequent users.
Hybrid SCDs can sometimes take you for a dizzying ride. Check with your ETL vendors to see if they support any of these techniques in their tools. Some do, which will greatly ease the burden on you.
The Data Warehouse Toolkit, 2nd Edition: The Complete Guide to Dimensional Modeling by R. Kimball & M. Ross (Wiley, 2002)
Ralph Kimball, founder of the Kimball Group, teaches dimensional data warehouse and ETL design through Kimball University and critically reviews large warehouses. He has four best-selling data warehousing books in print, including The Data Warehouse ETL Toolkit (Wiley, 2004). Write to him at firstname.lastname@example.org.
Margy Ross is president of the Kimball Group. she cowrote The Data Warehouse Lifecycle Toolkit (Wiley, 1998) and The Data Warehouse Toolkit, 2nd Edition (Wiley, 2002). Write to her at email@example.com.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.