Cliff Longman, CTO at Kalido, commented: "I think of BI as the car and data warehousing as the engine…Data warehouses should represent the historical view (and the "what if?" views as well if it is a business requirement) of data that a business relies on to judge its performance."
Cliff, we're pretty much in agreement. I think what we have is a problem of semantics (what a coincidence). We need to separate the data warehouse from data warehousing, which I think you did. The data warehouse is a repository of re-used information with historical context. Its use, going forward, will be diminished to some uncertain degree by advances in technology. It will not go away, at least not anytime soon. I have no quarrel with the data warehouse as a data source for reporting and analysis, but not as THE source.Data warehousing, on the other hand, is a much broader concept that includes methodologies, products, designs, best practices (not always best) and a cottage industry of consultants and gurus. Many of the systems integrators, software and hardware vendors have their own methodologies. What is common across all of them is a focus on data and the process of handling it, and the amount of the budget they consume. People, business people, are usually a secondary concern, if they figure into the design at all. Instead, slogans like, "Provide the right data to the right people at the right time to improve decision making," elevate and distort the importance of provisioning data and misplace the causality of good decision making - it isn't data. Notice how architecture diagrams take up 99% of the page and "users" are represented as stick people or a few PC's at the far end with a few right-pointing arrows.
Another example of the inward-focus of data warehousing is the slow adoption of metadata and, even now, its role primarily in the operation of the data warehouse process, particularly ETL. The BI vendors have been way ahead of data warehousing in the use of abstraction - semantic layers that isolate analytics from the physical schema and storage semantics of the data. Kalido is alone, I believe, in seeing the value of abstraction for managing change in the data warehouse process, but it is still dependent on the existing data warehouse model of staging, latency and design from the data outward. Perhaps you can demonstrate that this criticism of Kalido is not correct.
A more insidious problem is that the capabilities of BI tools have been constrained and distorted by the latent, read-only nature of data warehousing. This caused a rift between so-called analytical reporting and operational reporting because the transformation process of data warehousing (which is usually multi-step) makes it impossible to navigate backwards from analytical data to the actual operational data in original sources. It may be possible to examine transactional data at its lowest level of detail in the data warehouse (credit Kimball for demonstrating this), but navigating the keys is usually not possible. This artificial separation between the two causes the same kind of error and distortion that data warehousing was supposed to correct. Reports now are generated from either the data warehouse or directly against the operational systems, often yielding different answers. I don't think the analogy of the car and the engine holds up. Data warehousing is not the engine of BI. The engine is the business itself and the creative minds of the people who use and need BI to get the job done. I don't know where I'd put the data warehouse in that picture, maybe the road. But data warehousing is not in the picture, except maybe as a traffic jam, because it impedes BI by draining resources from the analytical process for endless efforts at data management purity and frameworks and layers without regard (historically) to the real needs of the business, which are: to be informed, to synthesize and share new ideas and be able to react to changes quickly. The overhead of data warehousING stands in opposition to that.
This may sound as a harsh indictment of data warehousing, but if you spend some time with the people who use BI, you will quickly gain a sense of their need for a lot more than they are getting, which is why they vote with their feet and revert to their spreadsheets to get the work done. If you only poll data warehousing industry people, you will miss this. Even the "research" in this industry is tainted by both leading questions as well as reduced expectations on the part of those surveyed. People rarely ask for what they think is not available. It's just as easy for them to underestimate what IT can deliver as it is for IT to underestimate the complexity and sophistication of the work that people do on a regular basis.
Maria T Gonzalez commented that "Neil seems more interested in generating controversy around semantics than in real business benefits."
Thank you for your comment, Maria. If raising questions about the conventional wisdom is creating controversy, I plead guilty. I don't know why you're focusing on semantics, though. These comments were about BI and data warehousing. I do believe that the application of semantic technology to BI is already underway and will have a positive impact.
Richard Hackathorn commented "As a long-time proponent of 'Single Version of the Truth' or whatever, I have seen the business value of data integration that gives a 'Consistent View of Business Reality.'
Richard, I'm not really sure what you mean by Consistent View of Business Reality. "Consistent" has multiple meanings, such as a consistent composition of materials (like, say, concrete), but I suspect that's not what you mean. In a logical sense, "consistent" means that in a single formal system of equations, no equation can be proven both true and false. Of course, Godel proved that a consistent set of formulas in a system cannot deliver proof of its own consistency, so who is there to evaluate if there really is a consistent view of business reality? How would you know if you have one? I'd suggest that any batch-oriented process is incapable of doing this and a historical record is likewise unable to harmonize "reality" over time. To me, this is the big problem with the Single Version of the Truth. Business reality is subject to interpretation. After all, there are separate "realities" for statutory and GAAP reporting, internal and external reporting. Risk portfolios are modeled by scenario for valuation. Where is the single version?
It's easy to mint pithy statements like "single version of the truth," and "consistent view of business reality," but where is the rigor? How can an industry the size of DW/BI operate with so little formality and absolutely no peer-review of concepts like this that get taken up and accepted as gospel?
I don't think "recognizing" the conflict is sufficient. I may "recognize" the conflict in Iraq, but I'm not doing anything about it. What would the different parties talk about? They need some framework, some stake in the ground and some bounded rationality. What they need is a foundational model for BI, not one defined by vendors, IT and consultants. I've written about this before.
Neil Raden is the founder of Hired Brains, providers of consulting, research and analysis in Business Intelligence, Performance Management, real-time analytics and information/semantic integration. Don't miss Neil's many insightful articles in the Intelligent Enterprise archive.These are extended responses to comments made in the original blog "Who Defines BI?" Cliff Longman, CTO at Kalido, commented: "I think of BI as the car and data warehousing as the engine…Data warehouses should represent the historical view (and the "what if?" views as well if it is a business requirement) of data that a business relies on to judge its performance." Cliff, we're pretty much in agreement...