Software // Information Management
05:55 PM

Kimball University: The Subsystems of ETL Revisited

These 34 subsystems cover the crucial extract, transform and load architecture components required in almost every dimensional data warehouse environment. Understanding the breadth of requirements is the first step to putting an effective architecture in place.


The primary mission of the ETL system is the handoff of the dimension and fact tables in the delivery step. There is considerable variation in source data structures and cleaning and conforming logic, but the delivery processing techniques are more defined and disciplined. Careful and consistent use of these techniques is critical to building a successful dimensional data warehouse that is reliable, scalable and maintainable.

Many of these subsystems focus on dimension table processing. Dimension tables are the heart of the data warehouse. They provide the context for the fact tables and hence for all the measurements. For many dimensions, the basic load plan is relatively simple: perform basic transformations to the data to build dimension rows to be loaded into the target presentation table.

Preparing fact tables is certainly important as they hold the key measurements of the business that users want to see. Fact tables can be very large and time consuming to load. However, preparing fact tables for presentation is typically more straightforward.

The delivery systems in the ETL architecture consist of:

Slowly Changing Dimension (SCD) Manager (subsystem 9) — Implements logic for slowly changing dimension attributes.

Surrogate Key Generator (subsystem 10) — Produces surrogate keys independently for every dimension.

Hierarchy Manager (subsystem 11) — Delivers multiple, simultaneous, embedded hierarchical structures in a dimension.

Special Dimensions Manager (subsystem 12) — Creates placeholders in the ETL architecture for repeatable processes supporting an organization’s specific dimensional design characteristics, including standard dimensional design constructs such as junk dimensions, mini-dimensions and behavior tags.

Fact Table Builders (subsystem 13) — Construct the three primary types of fact tables: transaction grain, periodic snapshot and accumulating snapshot.

Surrogate Key Pipeline (subsystem 14) — Replaces operational natural keys in the incoming fact table record with the appropriate dimension surrogate keys.

Multi-Valued Bridge Table Builder (subsystem 15) — Builds and maintains bridge tables to support multi-valued relationships.

Late Arriving Data Handler (subsystem 16) — Applies special modifications to the standard processing procedures to deal with late-arriving fact and dimension data.

Dimension Manager (subsystem 17) — Centralized authority who prepares and publishes conformed dimensions to the data warehouse community.

Fact Table Provider (subsystem 18) — Owns the administration of one or more fact tables and is responsible for their creation, maintenance and use.

Aggregate Builder (subsystem 19) — Builds and maintains aggregates to be used seamlessly with aggregate navigation technologies for enhanced query performance.

OLAP Cube Builder (subsystem 20) — Feeds data from the relational dimensional schema to populate OLAP cubes.

Data Propagation Manager (subsystem 21) — Prepares conformed, integrated data from the data warehouse presentation server for delivery to other environments for special purposes.

2 of 3
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest August 03, 2015
The networking industry agrees that software-defined networking is the way of the future. So where are all the deployments? We take a look at where SDN is being deployed and what's getting in the way of deployments.
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.