Kimball University: The Subsystems of ETL Revisited - InformationWeek
Software // Information Management
05:55 PM

Kimball University: The Subsystems of ETL Revisited

These 34 subsystems cover the crucial extract, transform and load architecture components required in almost every dimensional data warehouse environment. Understanding the breadth of requirements is the first step to putting an effective architecture in place.


The primary mission of the ETL system is the handoff of the dimension and fact tables in the delivery step. There is considerable variation in source data structures and cleaning and conforming logic, but the delivery processing techniques are more defined and disciplined. Careful and consistent use of these techniques is critical to building a successful dimensional data warehouse that is reliable, scalable and maintainable.

Many of these subsystems focus on dimension table processing. Dimension tables are the heart of the data warehouse. They provide the context for the fact tables and hence for all the measurements. For many dimensions, the basic load plan is relatively simple: perform basic transformations to the data to build dimension rows to be loaded into the target presentation table.

Preparing fact tables is certainly important as they hold the key measurements of the business that users want to see. Fact tables can be very large and time consuming to load. However, preparing fact tables for presentation is typically more straightforward.

The delivery systems in the ETL architecture consist of:

Slowly Changing Dimension (SCD) Manager (subsystem 9) — Implements logic for slowly changing dimension attributes.

Surrogate Key Generator (subsystem 10) — Produces surrogate keys independently for every dimension.

Hierarchy Manager (subsystem 11) — Delivers multiple, simultaneous, embedded hierarchical structures in a dimension.

Special Dimensions Manager (subsystem 12) — Creates placeholders in the ETL architecture for repeatable processes supporting an organization’s specific dimensional design characteristics, including standard dimensional design constructs such as junk dimensions, mini-dimensions and behavior tags.

Fact Table Builders (subsystem 13) — Construct the three primary types of fact tables: transaction grain, periodic snapshot and accumulating snapshot.

Surrogate Key Pipeline (subsystem 14) — Replaces operational natural keys in the incoming fact table record with the appropriate dimension surrogate keys.

Multi-Valued Bridge Table Builder (subsystem 15) — Builds and maintains bridge tables to support multi-valued relationships.

Late Arriving Data Handler (subsystem 16) — Applies special modifications to the standard processing procedures to deal with late-arriving fact and dimension data.

Dimension Manager (subsystem 17) — Centralized authority who prepares and publishes conformed dimensions to the data warehouse community.

Fact Table Provider (subsystem 18) — Owns the administration of one or more fact tables and is responsible for their creation, maintenance and use.

Aggregate Builder (subsystem 19) — Builds and maintains aggregates to be used seamlessly with aggregate navigation technologies for enhanced query performance.

OLAP Cube Builder (subsystem 20) — Feeds data from the relational dimensional schema to populate OLAP cubes.

Data Propagation Manager (subsystem 21) — Prepares conformed, integrated data from the data warehouse presentation server for delivery to other environments for special purposes.

2 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Success = Storage & Data Center Performance
Balancing legacy infrastructure with emerging technologies requires laying a solid foundation that delivers flexibility, scalability, and efficiency. Learn what the most pressing issues are, how to incorporate advances like software-defined storage, and strategies for streamlining the data center.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll