Software // Information Management
News
10/21/2007
05:55 PM
50%
50%

Kimball University: The Subsystems of ETL Revisited

These 34 subsystems cover the crucial extract, transform and load architecture components required in almost every dimensional data warehouse environment. Understanding the breadth of requirements is the first step to putting an effective architecture in place.

DELIVERING: PREPARE FOR PRESENTATION

The primary mission of the ETL system is the handoff of the dimension and fact tables in the delivery step. There is considerable variation in source data structures and cleaning and conforming logic, but the delivery processing techniques are more defined and disciplined. Careful and consistent use of these techniques is critical to building a successful dimensional data warehouse that is reliable, scalable and maintainable.

Many of these subsystems focus on dimension table processing. Dimension tables are the heart of the data warehouse. They provide the context for the fact tables and hence for all the measurements. For many dimensions, the basic load plan is relatively simple: perform basic transformations to the data to build dimension rows to be loaded into the target presentation table.

Preparing fact tables is certainly important as they hold the key measurements of the business that users want to see. Fact tables can be very large and time consuming to load. However, preparing fact tables for presentation is typically more straightforward.

The delivery systems in the ETL architecture consist of:

Slowly Changing Dimension (SCD) Manager (subsystem 9) — Implements logic for slowly changing dimension attributes.

Surrogate Key Generator (subsystem 10) — Produces surrogate keys independently for every dimension.

Hierarchy Manager (subsystem 11) — Delivers multiple, simultaneous, embedded hierarchical structures in a dimension.

Special Dimensions Manager (subsystem 12) — Creates placeholders in the ETL architecture for repeatable processes supporting an organization’s specific dimensional design characteristics, including standard dimensional design constructs such as junk dimensions, mini-dimensions and behavior tags.

Fact Table Builders (subsystem 13) — Construct the three primary types of fact tables: transaction grain, periodic snapshot and accumulating snapshot.

Surrogate Key Pipeline (subsystem 14) — Replaces operational natural keys in the incoming fact table record with the appropriate dimension surrogate keys.

Multi-Valued Bridge Table Builder (subsystem 15) — Builds and maintains bridge tables to support multi-valued relationships.

Late Arriving Data Handler (subsystem 16) — Applies special modifications to the standard processing procedures to deal with late-arriving fact and dimension data.

Dimension Manager (subsystem 17) — Centralized authority who prepares and publishes conformed dimensions to the data warehouse community.

Fact Table Provider (subsystem 18) — Owns the administration of one or more fact tables and is responsible for their creation, maintenance and use.

Aggregate Builder (subsystem 19) — Builds and maintains aggregates to be used seamlessly with aggregate navigation technologies for enhanced query performance.

OLAP Cube Builder (subsystem 20) — Feeds data from the relational dimensional schema to populate OLAP cubes.

Data Propagation Manager (subsystem 21) — Prepares conformed, integrated data from the data warehouse presentation server for delivery to other environments for special purposes.

Previous
2 of 3
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.