This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
New information integration technologies, with help from Moore's Law and increasing standards acceptance, are becoming today's disruptive technologies. Data warehouse incumbents are in for a surprise if they don't pay attention.
Andy Hayler of Kalido, in an essay published at Intelligent Enterprise's Web site ("EII: Dead on Arrival"), argued that enterprise information integration (EII) was a half-baked idea that ran roughshod over the essential disciplines of data warehousing. Hayler felt that benefits claimed by EII solutions, which advocate a federated rather than centralized approach to data warehousing, were ill-founded and poorly conceived (my words, not his). Specifically, he argued, EII does not address historical data, data quality, and the performance of queries against systems not optimized for decision support.
Although I tend to agree with Andy on these points, not all EII vendors are alike. There's more to the story than just EII. Unfortunately, his arguments sound like the classic claims of an incumbent challenged by a disruptive technology. The irony is that Kalido has been a disruptive force itself with regard to traditional practices of designing and maintaining a data warehouse. However, data warehousing itself is now facing disruption.
Over the past decade and more, each major data warehousing component has had its 15 minutes of fame. In the beginning, the attention was focused on databases and database servers, particularly on their ability to scale to large volumes of batch updates and heavy query processing. Following closely behind were extract, transform, and load (ETL) tools, and — ever so briefly in the limelight — metadata. Seven or eight years ago, data modeling grabbed the spotlight, only to be upstaged by business intelligence (BI) tools.
This cavalcade of technology stardom was in fact just a shifting of focus. Far from disappearing, the components just blended into the background, their roles secure. Today, vendors are consolidating their portfolios to put as many of these components as they can under a single brand name. Could they, however, be missing a larger trend? Fundamental advances in information integration are already disrupting the value proposition of most of these technologies.
Rationalizing data across an enterprise is a problem so hard that it's only been attacked piecemeal. However, the stars are aligning for real progress. First, Moore's Law continues to deliver unbelievable hardware resources to power solutions at increasingly affordable prices. Second, the development of the Internet has put us on the road to universal connection and access. Third, e-business — alive and well although somewhat less glamorous — has fueled the development of business integration technologies, which reside primarily in the enterprise application integration (EAI) technology bucket.
Fourth, when data warehousing gave rise to ETL tools, it put data quality and metadata issues on the front burner. Today's obsession with governance, disclosure, and regulatory compliance is accentuating the demand to solve these issues — but, the requirements are driving a change toward real-time reporting supported by EII.
Finally, other stars aligning include Web services, service-oriented architecture, and evolving information sharing standards based on XML, the Semantic Web, Resource Description Framework (RDF), and ontology — and what the newer and major enterprise application vendors are doing to take advantage of standards. Such steps include turning their application suites into modular components that come together through integration infrastructure; and second, introducing tools and methodologies for creating agile business processes, especially through employing the standard Business Process Modeling Language (BPML). The conclusion is that many existing approaches to information integration urgently require rethinking.
Pursuit of the Truth
Data warehouses extract data from nonconformed systems and perform excruciating data cleansing. In many cases, the errors found are corrected in the data warehouse but never in the feeder systems themselves. Thus, a Sisyphean task is repeated endlessly as new, equally dirty data sources are added.
The cost of populating a new data warehouse design combined with the cost of maintaining the warehouse accounts for a substantial proportion of IT budgets. Because systems tend to be acquired or developed at different times and for different constituencies, inconsistencies introduced make integration even more difficult. If not done properly, integration becomes the conduit for erroneous information that's ultimately reported and acted upon.
To deal with the problem of bad information, organizations are on a quest for the "single version of the truth" (SVOT). Interestingly, data warehouse promoters signed up for the SVOT quest as a way of justifying their rather expensive projects. Without ever achieving the goal, many now use SVOT as a justification for the data warehouse: a bit of a non sequitur, in my view. While a well-designed data warehouse may be able to present a single version of the data, "truth" is more elusive; it arises from much more than just the back end of data warehousing. Business rules, models, metrics, presentations, and interpretations are components of the "truth," and these normally are beyond the scope of a relational database and its ETL processes. In addition, most organizations have multiple data warehouses, a reality that begs the question: How many SVOTs can there be?
The heart of a data warehouse is the data model. In most data warehouse methodologies, the first piece of design work is to construct a logical model, which eventually gets transformed into a physical model of tables, attributes, keys, indexes, views and other database objects. Data modelers form the model out of their careful observation of the "requirements" of the system. In other words, the process starts with a container. While a well-designed data warehouse (and as we all know, too many aren't well designed) exhibits a great deal of flexibility in how it may be applied to business problems, most modeling processes are as fluid as concrete: Once set, they can't be altered without a jackhammer. (Since I mentioned Kalido earlier, I'd like to point at that this is precisely the problem the company's products have addressed quite well.)
Models may be extended, but usually what exists can't be changed without the following:
A logical model redesign
A physical model redesign
ETL routine modifications
Modification of all affected views, queries, and extracts
Absorbing the potential failure of downstream unauthorized applications, such as spreadsheets and personal databases
Reloading the data
In short, quick changes to a data warehouse are measured in months, not days, hours, or minutes. Can you say backlog?
The root cause of this problem is that the modeling process is presumed complete at the beginning of each development phase. It isn't open-ended, and can't cope with changes gracefully. The physical model becomes the reference model; all back-end and front-end routines address the physical model directly. Although there's metadata all over a data warehouse, most of it is passive and used primarily by one tool for its own purposes: and even in those cases, when there are changes, the administrator of the metadata tools must remap the metadata to the physical layers of the database.
At a functional level, data warehousing didn't achieve the elusive "closing the loop" because most systems lack a bidirectional data flow. Secondly, the warehouse's cumbersome load processes are an impediment to real-time data. EAI and EII have better stories to tell.
The EAI Story
Just a few short years ago, at least at the operational level, it seemed like data integration was going to be a solved problem. Many organizations chose to adopt a single reference model provided by an ERP applications vendor. It was a good news/bad news situation: While everyone applauded the potential end of data fragmentation, few organizations were comfortable with the locked-in notion that the ERP model was the right one for every situation.
Then, organizations discovered how difficult it was to integrate third-party software with ERP systems and develop and maintain that software through ERP version upgrades. EAI came to the rescue by providing standard "connectors" between different packaged systems. EAI provided a convenient, but not simple method of connecting two systems at the programmatic level.
Naturally, when the systems to be connected weren't off-the-shelf, organizations used connector kits. This practice points to a drawback of EAI: The solutions still required a fair amount of programming. Plus, EAI didn't provide much data integration because the software operated at the business-process level and supported primarily transaction processing. And for many organizations, EAI's ephemeral style of integrating instantaneously but not persistently was too confining for the amount of effort and the cost. To the extent that EAI and the connected ERP applications enforced standardized semantics (usually referred to as "canonical representation"), the tools didn't expose metadata in a way that other tools could consume easily.
In fairness, most EAI vendors have expanded their offerings and leveraged the tools and knowledge for broader usage. However, other integration vendors have also expanded their offerings, blurring the distinctions between ETL, EAI, and EII.
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.