Kimball University: Extreme Status Tracking For Real Time Customer Analysis

Customer interactions create a wealth of timely data that marketing departments are eager to exploit. The customer status fact table provides a central switchboard for using this fast-moving data.

InformationWeek Staff, Contributor

June 21, 2010

8 Min Read
InformationWeek logo in a gray background | InformationWeek

Ralph Kimball

We live in a world of extreme status tracking, where our customer-facing processes are capable of producing continuous updates on the transactions, locations, online gestures, and even the heartbeats of customers. Marketing folks and operational folks love this data because real-time decisions can be made to communicate with the customer. They expect these communications to be driven by a hybrid combination of traditional data warehouse history and up-to-the-second status tracking. Typical communications decisions include whether to recommend a product or service, or judge the legitimacy of a support request, or contact the customer with a warning.

As designers of integrated enterprise data warehouses (EDWs) with many customer-facing processes, we must deal with a variety of source operational applications that provide status indicators or data-mining-based behavioral scores we would like to have as part of the overall customer profile. These indicators and scores can be generated frequently, maybe even many times per day; we want a complete history that may stretch back months or even years. Though these rapidly changing status indicators and behavior scores are logically part of a single customer dimension, it is impractical to embed these attributes in a Type 2 slowly changing dimension. Remember that Type 2 perfectly captures history, and requires you to issue a new customer record each time any attribute in the dimension changes. Kimball Group has have long pointed out this practical conflict by calling this situation a "rapidly changing monster dimension." The solution is to reduce the pressure on the primary customer dimension by spawning one or more "mini-dimensions" that contain the rapidly changing status or behavioral attributes. We have talked about such mini-dimensions for at least a decade.

In our real-time, extreme status tracking world, we can refine the tried-and-true mini-dimension design by adding the following requirements. We want a "customer status fact table" that is...

  • a single source that exposes the complete, unbroken time series of all changes to customer descriptions, behavior, and status;

  • minutely time-stamped to the second or even the millisecond for all such changes;

  • scalable, to allow new transaction types, new behavior tags, and new status types to be added constantly, and scalable to allow a growing list of millions of customers each with a history of thousands of status changes;

  • accessible, to allow fetching the current, complete description of a customer and then quickly exposing that customer's extended history of transactions, behavior and status; and

  • usable as the master source of customer status for all fact tables in the EDW.

Our recommended design is the Customer Status Fact table approach shown in the figure below.

Kimball U

The Customer Status Fact table records every change to customer descriptions, behavior tags, and status descriptions for every customer. The transaction date dimension is the calendar date of the change and provides access to the calendar machinery that lets an application report or constrain on complex calendar attributes such as holidays, fiscal periods, day numbers, and week numbers.

The customer dimension contains relatively stable descriptors of customers, such as name, address, customer type, and date of first contact. Some of the attributes in this dimension will be Type 2 SCD (slowly changing dimension) attributes that will add new records to this dimension when they change, but the very rapidly changing behavior and status attributes have been removed to mini-dimensions. This is the classic response to a rapidly changing monster dimension. The Most Recent Flag is a special Type 1 field that is set to True only for the current valid customer record. All prior records for a given customer have this field set to False.

The customer durable key is what we normally designate as the natural key, but we call it durable to emphasize that the EDW must guarantee that it never changes, even if the source system has a special business rule that can cause it to change (such as an employee number that is re-assigned if the employee resigns and then is rehired). The durable key can be administered as a meaningless, sequentially assigned integer surrogate key in those cases where more than one source system provides conflicting or poorly administered natural keys. The point of the durable key is for the EDW to get control of the customer keys once and for all.

The customer surrogate key is definitely a standard surrogate key, sequentially assigned in the EDW back room every time a new customer record is needed, either because a new customer is being loaded or because an existing customer undergoes a Type 2 SCD change.

The double-dashed join lines shown in the figure are a key aspect of extreme status processing. When a requesting application sets the most recent flag to True, only the current profiles are seen. The customer surrogate key allows joining to the status fact table to grab the precise current behavior tags and status indicators. In a real-time environment, this is the first step in determining how to respond to a customer. But the customer durable key can then be used as an alternate join path to instantly expose the complete history of the customer we have just selected. In a real-time environment, this is the second step in dealing with the customer. We can see all the prior behavior tags and status indicators. We can compute counts and time spans from the customer status fact table. The behavior dimension can be modeled in two ways. The simpler design is a wide dimension with a separate column for each behavior tag type. Perhaps these behavior tags are assigned by data mining applications that monitor the customer's behavior. If the number of behavior tag types is small (less than 100), this design works very well because query and report applications can discover and use the types at run time. New behavior tag types (and thus new columns in the behavior dimension) can be added occasionally without invalidating existing analysis applications.

A more complex behavior dimension design is needed when a very large and messy set of behavior descriptors is available. Perhaps you have access to a number of demographic data sources covering complicated overlapping subsets of your customer base. Or perhaps you have account application data containing financial asset information that is very interesting but can be described in many ways. In this case, you will need a dimensional bridge table. Kimball Group has described dimensional bridge table designs in previous articles. Search for "bridge tables" (in quotes) at www.kimballgroup.com.

The status dimension is similar to the behavior dimension but can probably always be a wide dimension with a separate column for each status type, simply because this dimension is more under your internal control than the behavior dimension.

The transaction dimension describes what provoked the creation of the new record in the customer status fact table. Transactions can run the gamut from conventional purchase transactions all the way to changes in any of the customer-oriented dimensions, including customer, behavior and status. The transaction dimension can also contain special priority or warning attributes that alert applications to highly significant changes somewhere in the overall customer profile.

The begin and end effective date/times are ultra-precise, full-time stamps for when the current transaction became effective and when the next transaction became effective (superseding the current one). Kimball Group has given a lot of thought to these ultra-precise time stamps and we recommend the following design:

  • The grain of the time stamps should be as precise as your DBMS allows, at least down to the individual second. Some day in the future, you may care about time stamping some behavioral change in such a precise way.

  • The end effective time stamp should be exactly equal to the begin time stamp of the next (superseding) transaction, not "one tick" less. You need to have a perfect unbroken set of records describing your customer without any possibility of miniscule gaps because of your choice of a "tick".

  • In order to find a customer profile at a specific point in time, you won’t be able to use BETWEEN syntax because of the preceding point. You will need something like

    #Nov 2, 2009: 6:56:00# >= BeginEffDateTime and #Nov 2, 2009: 6:56:00# < EndEffDateTime

    as your constraint, where Nov 2, 2009, 6:56am is the desired point in time.

The customer status fact table is the master source for the complete customer profile, gathering together standard customer information, behavior tags, and status indicators. This fact table should be the source for all other fact tables involving customer. For example, an orders fact table would benefit from such a complete customer profile, but the grain of the orders fact table is drastically sparser than the customer status fact table. Use the status fact table as the source of the proper keys when you create an orders fact record in the back room. Decide on the exact effective date/time of the orders record, and grab the customer, behavior, and status keys from the customer status fact table and insert them into the orders table. This ETL processing scenario can be used for any fact table in the EDW that has a customer dimension. In this way, you add considerable value to all these other fact tables.

This article has described a scalable approach for extreme customer status tracking. The move toward extreme status tracking has been coming on like an express train, driven both by customer facing processes that are capturing micro-behavior, and by marketing’s eagerness to use this data to make decisions. The customer status fact table is the central switchboard for capturing and exposing this exciting new data source.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights