Kimball University: Three ETL Compromises to Avoid
Why neglecting slowly changing dimensions, failing to capture metadata and overlooking scope creep can be the undoing of a dimensional data warehousing initiative.
Whether you are developing a new dimensional data warehouse or replacing an existing environment, the ETL (extract, transform, load) implementation effort is inevitably on the critical path. Difficult data sources, unclear requirements, data quality problems, changing scope, and other unforeseen problems often conspire to put the squeeze on the ETL development team. It simply may not be possible to fully deliver on the project team's original commitments; compromises will need to be made. In the end, these compromises, if not carefully considered, may create long-term headaches.
In my last article on "Six Key Decisions for ETL Architectures," I described the decisions ETL teams face when implementing a dimensional data warehouse. This article focuses on three common ETL development compromises that cause most of the long-term problems around dimensional data warehouses. Avoiding these compromises will not only improve the effectiveness of your ETL implementation, but will also increase the likelihood of overall DW/BI success.
Kimball Group has written extensively on slowly changing dimension (SCD) strategies and complementary implementation alternatives. It's important that the ETL team embrace SCDs as an important strategy early in the initial implementation process. A common compromise is to put off to the future the effort required to properly support SCDs, especially Type 2 SCDs where dimension changes are tracked by adding new rows to the dimension table. The result is often a total rework disaster.
Deferring the implementation of proper SCD strategies does save ETL development time in the immediate phase. But as a result, the implementation embraces only Type 1 SCDs, where all history in the data warehouse is associated with current dimension values. Initially, this seems to be a reasonable compromise. However, it's almost always more difficult to "do it right" when you have to circle back in a later phase. The unfortunate realities are that:
Following a successful initial implementation, the team faces pressure to roll out new capabilities and additional phases without time to revisit prior deliverables and add the required change-tracking capabilities. Thus, the rework ultimately required to support SCD requirements continues to expand.
Once the ETL team finally has the bandwidth to address SCD, the ugly truth becomes apparent. Adding SCD Type 2 capabilities into the historical data requires rebuilding every dimension that contains Type 2 attributes; each dimension will have to have its primary key rekeyed to reflect the new historically appropriate Type 2 rows. Rebuilding and rekeying even one core conformed dimension will unavoidably require reloading all impacted fact tables due to the new dimension key structures.
Facing a possible rebuild of much of the data warehouse environment, many organizations will back away from the effort. Rather than reworking the existing historical data to restate the dimension and fact tables in their correct historical context, they implement the proper SCD strategies from a point-in-time forward. By compromising the implementation of proper SCD techniques in the initial development process, the organization has lost possibly years of important historic context.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.