Kimball University: The Subsystems of ETL Revisited
(Page 2 of 3)
DELIVERING: PREPARE FOR PRESENTATION
The primary mission of the ETL system is the handoff of the dimension and fact tables in the delivery step. There is considerable variation in source data structures and cleaning and conforming logic, but the delivery processing techniques are more defined and disciplined. Careful and consistent use of these techniques is critical to building a successful dimensional data warehouse that is reliable, scalable and maintainable.
Many of these subsystems focus on dimension table processing. Dimension tables are the heart of the data warehouse. They provide the context for the fact tables and hence for all the measurements. For many dimensions, the basic load plan is relatively simple: perform basic transformations to the data to build dimension rows to be loaded into the target presentation table.
Preparing fact tables is certainly important as they hold the key measurements of the business that users want to see. Fact tables can be very large and time consuming to load. However, preparing fact tables for presentation is typically more straightforward.
The delivery systems in the ETL architecture consist of:
Slowly Changing Dimension (SCD) Manager (subsystem 9) — Implements logic for slowly changing dimension attributes.
Surrogate Key Generator (subsystem 10) — Produces surrogate keys independently for every dimension.
Hierarchy Manager (subsystem 11) — Delivers multiple, simultaneous, embedded hierarchical structures in a dimension.
Special Dimensions Manager (subsystem 12) — Creates placeholders in the ETL architecture for repeatable processes supporting an organization’s specific dimensional design characteristics, including standard dimensional design constructs such as junk dimensions, mini-dimensions and behavior tags.
Fact Table Builders (subsystem 13) — Construct the three primary types of fact tables: transaction grain, periodic snapshot and accumulating snapshot.
Surrogate Key Pipeline (subsystem 14) — Replaces operational natural keys in the incoming fact table record with the appropriate dimension surrogate keys.
Multi-Valued Bridge Table Builder (subsystem 15) — Builds and maintains bridge tables to support multi-valued relationships.
Late Arriving Data Handler (subsystem 16) — Applies special modifications to the standard processing procedures to deal with late-arriving fact and dimension data.
Dimension Manager (subsystem 17) — Centralized authority who prepares and publishes conformed dimensions to the data warehouse community.
Fact Table Provider (subsystem 18) — Owns the administration of one or more fact tables and is responsible for their creation, maintenance and use.
Aggregate Builder (subsystem 19) — Builds and maintains aggregates to be used seamlessly with aggregate navigation technologies for enhanced query performance.
OLAP Cube Builder (subsystem 20) — Feeds data from the relational dimensional schema to populate OLAP cubes.
Data Propagation Manager (subsystem 21) — Prepares conformed, integrated data from the data warehouse presentation server for delivery to other environments for special purposes.