Kimball University: Three ETL Compromises to Avoid
Why neglecting slowly changing dimensions, failing to capture metadata and overlooking scope creep can be the undoing of a dimensional data warehousing initiative.
Compromise 2: Failing to Embrace a Metadata Strategy
DW/BI environments spin off copious amounts of metadata. There is business metadata, process metadata, and technical infrastructure metadata that all needs to be vetted, captured and made available. The ETL processes alone generate significant amounts of metadata.
Unfortunately, many ETL implementation teams do not embrace metadata early in the development process, putting off its capture to a future phase. This compromise typically is made because the ETL team does not "own" the overall metadata strategy. In fact, in the early stages of many new implementation efforts, it's not uncommon for there to be no designated owner of the metadata strategy.
Lack of ownership and leadership makes it easy to defer dealing with metadata, but that's a short-sighted mistake. Much of the critical business metadata is identified and captured, often in spreadsheet form, during the dimensional-modeling and source-to-target mapping phases. What's more, most organizations use ETL tools to develop their environment, and these tools have capabilities to capture the most pertinent business metadata. Thus, the ETL development phase presents an opportune moment -- often squandered -- to capture richly described metadata. Instead, the ETL development team only captures the information required for their development purposes, leaving valuable descriptive information on the cutting room floor. Ultimately, in a later phase, much of this effort ends up being redone in order to capture the required information.
At a minimum, the ETL team should strive to capture the business metadata created during the data-modeling and source-to-target mapping processes. Most organizations find it valuable to focus initially on capturing, integrating, flowing, and, ultimately, surfacing the business metadata through their BI tool; other metadata can be integrated over time.
Compromise 3: Not Delivering a Meaningful Scope
The ETL team is often under the gun to deliver results under tight time constraints. Compromises must be made. Reducing the scope of the initial project can be an acceptable compromise. If, for example, a large number of schemas was included in the initial scope, one time-honored solution is to break that effort up into several phases. It's a reasonable, considered compromise assuming the DW/BI project team and sponsors are all fully, if not grudgingly, on board.
But it's a problem when the ETL team makes scope compromises without proactively communicating with the DW/BI project team and sponsors. Clearly, this is a recipe for failure and an unacceptable compromise.
This situation is often a symptom of deeper organizational challenges. It can start innocently enough, with shortcuts taken under pressure in the heat of the moment. In retrospect, however, these compromises would never have been made in the full light of day. In an effort to achieve overly ambitious deadlines, the ETL team might fail to handle data quality errors uncovered during the development process, fail to properly support late arriving data, neglect to fully test all ETL processes, or perform only cursory quality assurance checks on loaded data. These compromises lead to inconsistent reporting, an inability to tie into existing environments, and erroneous data and often lead to a total loss of confidence among business sponsors and users. The outcome can be total project chaos and failure.
Make Compromises Openly and Honestly
Compromises may be necessary. The most common concession is to scale back an overly ambitious project scope; but key stakeholders need to be included in this decision. Other, less intrusive changes can be considered, such as reducing the number of years of back history used to seed a new environment, reducing the number of dimension attributes or number of metrics required in the initial phase (while being careful about SCD Type 2 requirements), or reducing the number of source systems integrated in the initial phase. Just keep everyone informed and on the same page. The key is to compromise in areas that do not put the long-term viability of the project at risk.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.