Kimball University: Four Fixes Refurbish Legacy Data Warehouses
DW/BI professionals are often tasked with making evolutionary upgrades and improvements to minimize cost and upheaval in the current analytic environment. We explore four upgrades that can breathe new life into legacy data warehouses.
Few readers have the luxury of working with a blank slate when it comes to the development of their data warehouse/business intelligence environment. Instead, many of us deal with the decisions--and potentially the sins--of our predecessors. Your DW/BI environment would likely look very different if you were to build it from scratch, but a complete toss-and-rebuild is seldom an alternative.
More often, DW/BI professionals are tasked with making evolutionary upgrades and improvements to minimize cost and upheaval of the current analytic environment. These four upgrades can breathe new life into legacy data warehouses.
Conform The Nonconformed Dimensions
The Kimball Group has steadfastly made the case for master conformed dimensions. Conformed dimensions contain the descriptive attributes and corresponding names, meanings and values that have been agreed to across the enterprise. Using conformed dimensions ensures that the data warehouse is delivering consistently defined attributes for labeling, grouping, filtering and integrating data from multiple business processes.
Unfortunately, many data warehouses/marts were developed without regard for this critical master data. Standalone data stores with independently defined dimensions are often constructed because it's the path of least resistance when deadlines loom. Rather than attempt to reach consensus on common reference data, isolated teams believe it's quicker and easier to just build autonomous dimensions. This approach may let these teams declare victory, but it doesn't support the business' desire for integration and consistency.
Some organizations wind up with independent data stores because developers purposely focused on delivering a departmental solution, likely due to the funding available. Without the vision and acknowledged need for an enterprise perspective, teams are often chartered to build with blinders on to meet a limited set of goals.
So what do you do if you're confronted with an environment that's been built without a foundation of common conformed dimensions? Can these stovepipes be rescued? In spite of the vendor hype, there's no magic elixir that miraculously delivers master dimensions. Technology can facilitate and enable data integration, but there's no silver bullet. The first step toward integration nirvana is to assess the state of the data as well as requirements, expectations and buy-in from the business. You can self-diagnose the issues related to nonconforming dimensions, but keep in mind that you're likely to face a long, uphill internal struggle and resistance to change if the business community doesn't perceive the need or incremental value in the project.
As we described in "Data Stewardship 101" (June 2006), one of the most crucial steps in conforming nonconformed dimensions is to organize the appropriate resources to tackle this vexing problem. Data stewards must be identified and assigned responsibility and authority to determine common dimension attributes, define domain values and transformation business rules, and establish ongoing processes to ensure data quality. Obviously, that's no small feat, so it's critical to identify the right leader. Ideally, you want someone from the business community who is respected by senior management and who has knowledge and skills to achieve organizational consensus. Navigating the unavoidable cross-functional challenges requires experience, widespread respect, political acumen and strong communication skills.
Not everyone is cut out to be a data steward. It's feasible for folks on the data warehouse team to serve as stewards, but they need to demonstrate all the traits and characteristics described above. Most important, they need the support of business management and the authority to push cross-enterprise agreement and adoption, even when unpopular compromise is required. Without this power, stewards face the prospect of endless tire spinning as they try to rationalize diverse perspectives.
Once the data stewards produce the specifications for a conformed master dimension, then the skilled extract-transform-and-load staff builds the master dimension. Depending on the existence and/or quality of reference data in the operational source systems, this may require intricate record matching and deduplication. When merging multiple source systems, clearly defined rules of survivorship are needed to identify which data source takes precedence for each attribute.
With master dimensions built, it's then time to retrofit the existing data warehouses/marts with standardized data. New surrogate key mapping tables for each dimension are used to recast existing fact table rows. In addition, aggregate summary tables and cubes will likely need to be reconstructed. While the implications for the ETL system are inevitably significant, hopefully the impact on the business intelligence layer of these underlying physical table changes can be minimized with an abstraction layer using views, synonyms or your BI tool's metadata, depending on your platforms.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.