The New Deal - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
02:12 PM

The New Deal

A model-driven approach to BI and data warehousing puts analytical power back in step with business requirements.

Data warehousing arrived 20 years ago to solve a set of problems that, to a large extent, no longer exist. Computing resources were once scarce and prohibitively expensive; dozens of homegrown systems scattered data everywhere and carried their own sets of semantics, business rules, and data management. Businesses simply could not get the information in a timely manner to do reporting, analysis, and planning.

Today, computing resources are practically boundless; though not inexpensive, the cost of servers, networks, and storage as a proportion of total project expenses has dropped dramatically. Data is more organized, in fewer systems, and getting better all the time as a result of general perceptions that it has value beyond the application programs that produce it — and that good data quality and architecture are enterprise requirements. Plus, organizations are benefiting from what the 1990s brought us: not only universal connectivity (the Internet) but also universal information access (the Web).

It stands to reason that, given these fundamental changes in the problem space, the solution set would change as well. But it hasn't. Data warehousing remains stubbornly focused on data and databases instead of information processes, business models, and closed-loop solutions. Methodologies and best practices for data warehousing have barely budged. Our approach to building data warehouses and business intelligence (BI) environments around them is out of step with the reality of today's information technology.

The Truth Hurts

A good example is the notion of a "single version of the truth." Many use this to justify the long, expensive construction of enterprise data warehouses that require person-years of modeling just to get started and extensive development to put into production — and many never get that far. The data warehouse never was a good repository of truth; it is merely a source for downstream applications, which add value to the data through application of formulas, aggregations, and other business rules. In addition, the functional reach of the source systems (such as ERP, customer relationship, and supply chain management) is now so broad that, unlike the operational systems of 20 years ago, much of the "truth" is already contained in and reported from them.

The rigidity of data warehouses endows them with calamitous potential. Current best practices in enterprise systems development call for formal approaches, such as the Zachman Framework. Because the architectures are designed first and then — based on the data modeler's grasp of requirements gathered at the start of the process — rendered into relational data models, organizations burn a version of reality into their data warehouse circuitry right from the beginning.

Unfortunately, no one knows ahead of time what the data warehouse requirements really will be. The gathered "requirements" should be suspect from the beginning. And because relational database data models lack fuller expression, boiling down requirements into one takes us one step further away from a real working business model. A data model cannot capture the subtleties and dynamics of the business. Such designs have the fluidity of poured concrete: that is, only fluid until set. Later, as real requirements begin to emerge, the only choices are to jackhammer designs and start over (rarely happens) or to add more layers of design (data marts, operational data stores, cubes, and so forth), gradually building a structure that is not only expensive and brittle, but also difficult to maintain.

Current methodologies stress the need for iterations: an indication that participants agree that it's not possible to specify a data warehouse all at once. Never, however, is it made clear what's supposed to happen with the previous version of the data model. Or, for that matter, the data models: a layered design like the Corporate Information Factory has no less than seven schema types, and potentially dozens of schemas in the single warehouse. Organizations direct extract, transform, and load (ETL) development against the physical schemas. Most BI tools interact directly with the physical schemas. Does it make sense to have each of the iterations invalidate previous efforts?

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll