Kimball University: Eight Guidelines for Low-Risk Enterprise Data Warehousing
New data sources and BI delivery modes make it that much harder for EDW initiatives to succeed. Here are eight recommendations for controlling project costs and reducing risks.
In today's economic climate, business intelligence (BI) faces two powerful and conflicting pressures. On the one hand, business users want more focused insight from their BI tools into customer satisfaction and profitability. Conversely, these same users are under huge pressure to control costs and reduce risks. The explosion of new data sources and new delivery modes available for BI really makes this dilemma acute.
How can we fail?
We could do nothing, thereby overlooking important customer insights and specific areas where we could be more profitable. We could start a task force to produce a grand architectural specification covering the next five years, which is just another way of doing nothing. We could implement several high-priority spot solutions, ignoring overall enterprise integration. We could start by buying a big piece of iron, believing that it is so powerful that it will handle any type of data, once we decide what that data is.
You get the idea. Even though some of these ways to fail seem obviously dumb, we can nevertheless find ourselves in these positions when we respond with a crisis mentality.
How can we succeed? How can we move forward quickly and decisively while at the same time clamping down on risk? EDW development is never easy, but this article presents eight guidelines for approaching this intimidating task in a flexible, reasonably low-risk way.
Work on the Right Thing
We recommend a simple technique for deciding what the right thing is. Make a list of all your potential EDW/BI projects and place them on a simple 2x2 grid, like the one at right.
Figure out, with your end users, how valuable each of the potential projects would be, independent of the feasibility. Next, do an honest assessment of whether each project has high-quality data and how difficult it will be to build the data delivery pipelines from the source to the BI tool. Remember that at least 70 percent of BI project risks and delays come from problems with the data sources and meeting data delivery freshness (latency) requirements.
Once projects have been placed on the grid, work from the upper right corner. Project A shown above has high business impact and is eminently feasible. Don't take the easy way out and start with low-risk project D. That project may be feasible, but even if you do a great job, it won't have much impact. Similarly, don't start with project C. The users would love to have it, but there are big feasibility issues which translate into big risks.
Give Business Users Control
A few years ago, data warehousing was relabeled as "business intelligence." This relabeling was far more than a marketing tactic because it correctly signaled the transfer of the initiative and ownership of the data assets to the business users. Everyone knows instinctively that they can do a better job if they can see the right data. Our job in IT is to sort through all the technology in order to give the users what they want.
The transfer of control means having users directly involved with, and responsible for, each EDW/BI project. Obviously these users have to learn how to work with IT so as to make reasonable demands. The impact-feasibility grid shown above is not a bad place to start. Proceed Incrementally
In this era of financial uncertainty, it's hard to justify a classic "waterfall" approach to EDW/BI development. In the waterfall approach, a written functional specification is created that completely specifies the sources, the final deliverables and the detailed implementation. The rest of the project implements this specification, often with a big-bang comprehensive release. The origins of the waterfall approach lie in the manufacturing industry, where changes after implementation are prohibitively costly. The problem with the waterfall approach for EDW/BI projects is that it takes too long and does not recognize the need to adapt to new requirements or changes in understanding.
Many EDW/BI projects are gravitating to what could be called an "agile" approach that emphasizes frequent releases and mid-course corrections. Interestingly, a fundamental tenet of the agile approach is ownership by the business users, not by technical developers.
An agile approach requires tolerating some code rewriting and not depending on fixed-price contracts. The agile approach can successfully be adapted to enterprisewide projects such as master data management and enterprise integration. In these cases, the first few agile releases are not working code but rather architectural guidelines.
Start with Lightweight, Focused Governance
Governance is recognizing the value of your data assets and managing those assets responsibly. Governance is not something that is tacked onto the end of an EDW/BI project. Governance is part of a larger culture that recognizes the value of your data assets and is supported and driven by senior executives. At the level of an individual project, governance is identifying, cataloging, valuing, assigning responsibility, securing, protecting, complying, controlling, improving, establishing consistent practices, integrating across subject areas, planning for growth, planning to harvest value, and generally nurturing. Governance doesn't need a waterfall approach, but these issues need to be part of the project from the very start. Failing to think about governance can result in fundamental rework of the EDW/BI project.
Build a Simple, Universal Platform
One thing is certain in the BI space: the nature of the end-user-facing BI tools cannot be predicted. In the future, what's going to be more important: data mining predictive analytics, delivery to mobile devices, batch reporting, real-time alerts, or something we haven't thought of yet? Fortunately, we have a good answer to this question; we must recognize that the enterprise data warehouse is the single platform for all forms of business intelligence. This viewpoint makes us realize that the EDW's interface to all forms of BI must be agnostic, simple and universal.
Dimensional modeling meets these goals as the interface to all forms of BI. Dimensional schemas contain all possible data relationships, but at the same time can be processed efficiently with simple SQL emitted by any BI tool. Integrate Using Conformed Dimension
Enterprisewide integration has risen to the top of the list of EDW/BI technical drivers along with data quality and data latency. Dimensional modeling provides a simple set of procedures for achieving integration that can be effectively used by BI tools. Conformed dimensions enable BI tools to drill across multiple subject areas, assembling a final integrated report. The key insight is that the entire dimension (customer, for example) does not need to be made identical across all subject areas. The minimum requirement for a drill-across report is that at least one field be common across multiple subject areas. Thus, the EDW can define a master enterprise dimension containing a small but growing number of conformed fields. These fields can be added incrementally over time. In this way, we reduce the risk and cost of enterprise integration at the BI interface. This approach also fits well with our recommendation to develop the EDW/BI system incrementally.
Manage Quality a Few Screens at a Time
In our articles and books, Kimball Group has described an effective approach to managing data quality by placing data quality screens throughout the data pipelines leading from the sources to the targets. Each data quality screen is a test. When the test fails or finds a suspected data quality violation, the screen writes a record in an error event fact table -- a dimensional schema hidden in the back room away from direct access by end users. The error event fact table lets EDW/BI administrators measure the volume and source of the errors encountered. A companion audit dimension summarizes the error conditions and is exposed to the end users along with every dimensional fact table.
The data quality screens can be implemented one at a time, allowing development of the data quality system to grow incrementally.
Use Surrogate Keys Throughout
Finally, a seemingly small recommendation to reduce your EDW/BI development risk: make sure to build all your dimensions (even Type 1 Dimensions) with surrogate primary keys. This insulates you from surprises downstream when you acquire a new division that has its own ideas about keys. What's more, all your databases will run faster with surrogate keys.
More Advice on Low-Risk EDWs
Many of these ideas have been described in Intelligent Enterprise as well as the Kimball Group Toolkit series of books and our monthly Design Tips. We are always interested in hearing your opinions about low risk approaches to data warehousing. Write to me at [email protected].
About the Author
You May Also Like