When The Satisfaction Indicator Is Also A Fact
A common dilemma arises in shipment invoice tracking fact tables. We often have a set of standard satisfaction indicators including on-time description (on-time or late), order complete description (complete or partial), and damage free description (damage free or damaged). These are handled by a simple satisfaction dimension as shown in the diagram at right.
A simple satisfaction dimension.
|(click for larger image)|
But in these situations it is also traditional to compute numeric measures of satisfaction such as Percent On Time, which would be awkward to compute from the satisfaction dimension. In this case, we confidently add counts consisting of ones and zeros to the fact table itself. See the last three fields in the fact table at right. Admittedly these ones and zeros are completely redundant with the contents of the satisfaction dimension, but they serve a different purpose. These ones and zeros are the grist for computations, not constraints or row labels. It's okay to use the extra disk space for this design! These kinds of binary satisfaction measures that straddle the world of facts and dimension attributes are relatively rare, but they are powerful when they can be used.
The Unpredictable, Chaotic List
In some businesses, a standard set of reliable satisfaction indicators is not available, but there may be a wealth of incompatible information from various sources. We can't use the fixed-column causal-dimension approach described in the first section because there might be hundreds of columns, most of which aren't filled in for a given transaction event. For example, suppose that we are capturing mortgage loan application information. These applications could include a mind-boggling array of demographic and financial context information, with the available content of this information changing rapidly over time. In this case, the preferred model uses a bridge table between the transaction fact table and a satisfaction dimension as shown at right.
Using a bridge table between the transaction fact table and a satisfaction dimension.
|(click for larger image)|
The foreign-key-to-foreign-key join shown above is not a mistake. This is perfectly legal and well defined in relational databases and SQL. In this case we assemble a potentially unique set of satisfaction indicators for a given mortgage loan transaction and give this set a satisfaction group primary key in our back room ETL processes. There are as many records in the bridge table for a specific satisfaction group as there are indicators that we have assembled for this specific (loan approval) transaction instance. This style of design has powerful advantages and powerful caveats. It's powerful because it is incredibly flexible. New satisfaction indicators and sets of indicators can be introduced constantly without changing the schema design and without having any NULL satisfaction entries. But this schema is awkward to query. For instance, if we are looking for satisfaction groups that involved a multi-unit condominium, federal loan assistance, and no former bankruptcy, we would need SQL like the following:
SELECT B1. GROUP_KEY
FROM SATISFACTION_BRIDGE B1
FROM SATISFACTION_BRIDGE B2, SATISFACTION D2
WHERE B2.INDICATOR_KEY = D2.INDICATOR_KEY
AND B1.GROUP_KEY = B2.GROUP_KEY
AND (D2.DESCRIPTION = 'Multi-Unit Condominium' or
D2.DESCRIPTION = 'Federal Loan Assistance' or
D2.DESCRIPTION = 'No Former Bankruptcy'))
We need the count to be 3 because we are only interested in satisfaction sets that contain all three conditions. Also note that we must test for "No Former Bankruptcy" explicitly, rather than searching for sets that do not contain a record of bankruptcy. This is because we may not be sure that the lack of a bankruptcy indicator is a reliable record that no bankruptcy occurred. It may just mean that the information was not collected.
This SQL is far beyond the reach of an ad hoc end user, so we must package such queries under a user interface application that gives the user various choices of possible satisfaction indicators and lets them point and click.
This article has described three powerful designs for capturing customer satisfaction indicators and attaching them to low-level transaction fact records. Marketing analysts often find these indicators useful for teasing out the real reasons customers engage our businesses.
Ralph Kimball is the founder of the Kimball Group and the data warehouse/business intelligence thought leader on the dimensional approach since the mid 1980s. He has trained more than 10,000 IT professionals and sold more than 250,000 copies of his Toolkit books. Write him at Ralph@kimballgroup.com.