Remember the great proliferation of data marts in the late 1990s? By the turn of the century, most large corporations had dozens sometimes hundreds of data marts, resulting in an administrative nightmare. Then the pendulum swung the other way, and the same companies consolidated their data marts as best they could, during the great database consolidation that geared up in 2001. According to Giga Information Group, 17 percent of Global 2000 companies completed a data mart consolidation project by the end of 2002.
Most IT professionals are fully aware of the rise and fall of the data mart, yet few realize (or will admit) that the operational data store (ODS) is suffering an analogous and equally painful trend.
Before I dive into the ODS debacle, let me define some terms. People talk about "the" ODS, whereas there are actually several different types. I can't define them all here, so I'll just describe the ODS type that's the real culprit. It's a mid-tier database into which data about a single business subject is integrated.
For many years, the most common business subject has been operations. Batch processes refresh the ODS overnight with operational data, then run operational reports, so executives can study the previous day's corporate performance each morning. In recent years, however, the most common subject for a newly implemented single-subject ODS is the customer. Therefore, some companies have multiple ODSs, on multiple subjects such as operations, customers, sales, finances, orders, and so on. In fact, some companies have two customer ODSs: one for operational purposes, and another for reporting or direct marketing.
As with the mart, restricting an ODS to a single subject has merit, such as performance tuning (to enable fast lookup) or a data model peculiar to a subject (such as the very wide records of a customer ODS). Furthermore, other tasks you might accomplish through an ODS (such as data staging, application integration, reporting, data analysis, and master data management) all have unique requirements. As a result, satisfying multiple requirements with a single ODS is difficult. For these reasons, multiple ODSs, each modeled and optimized for a particular data subject or other use, sometimes make sense.
The catch-22 of the single-subject ODS concept is that it encourages proliferation and inevitably leads to a plague of homegrown integration solutions, each with an ODS at its hub. Today's conventional wisdom says you should be consolidating databases for the sake of administrative efficiency, not proliferating more. And that's exactly the problem with proliferating ODSs: numerous, noncentralized databases almost always add up to high IT costs for administration and maintenance.
Even though many ODSs are small, simple, and easy to maintain in the beginning, when an ODS is successful, much is added to it, and costs spiral as IT extends it. Database architect payroll is burned up remodeling each version of the ODS. DBA payroll is burned up reorganizing and reindexing the database to optimize its performance. Homegrown data integration often accompanies the homegrown ODS, so programmer payroll is burned up writing new integration routines. If database modelers, DBAs, and programmers leave, they take their intellectual property with them, making the solution even more unmanageable.
If you already have a large number of single-subject ODSs, what can you do? You can't kill these, because (despite the administrative challenge) they serve useful purposes, they represent a considerable investment, and there aren't many compelling replacements at the moment. Instead of abandoning your single-subject ODSs, focus on best practices for ODS federation, architecture, and consolidation.
Federate your ODSs. Similar to how multiple single-subject data marts may be federated via shared dimensions, multiple single-subject ODSs can also be federated via redundant data structures. For instance, you probably want to run broad queries across related ODSs, so you'd federate the customer-facing ones such as customer, orders, shipping, call center, and so on. At the least, this step reduces the silo effect of multiple ODSs.
Architect your ODS population. Note that federation requires that an architecture be selected and designed; otherwise, the common elements necessary for federation aren't in place. With most legacy ODSs, no architecture was selected or designed. They simply evolved and were added to. The ideal practice is to design a federated ODS architecture and implement each ODS as an integrated component of that architecture.
Consolidate related ODSs. But, when you have multiple ODSs that are obviously related (such as the customer-oriented ones mentioned earlier), they may be candidates for the ultimate solution: ODS consolidation. Companies with many ODSs can never reduce them to one. But it's worthwhile from an administrative standpoint (and possibly from a database licensing standpoint) to consolidate related ones.
Consider vendors' ODSs. The vast majority of ODSs I'm thinking of are homegrown, because very few vendor products package an ODS or equivalent product. Although many packaged data warehouse products include an ODS, I'm not talking about that kind of ODS. The most common subject for an ODS in recent years has been "customer." Luckily, some vendor products implement a so-called customer hub. It's like a customer ODS, but with integration technologies and business rules to control read/write access to the hub. For this specific type of ODS, users should investigate products from DWL, MioSoft, and Siperian.
Use integration tools. All too often, the homegrown ODS is fed by a hand-coded extract, transform, and load (ETL) application or (worse) by manually created replication devices such as triggers. Instead of compounding the administrative challenge of homegrown ODSs with a homegrown integration solution, use vendor tools for ETL, database replication, and other integration technologies.
Consider other integration technologies. Many ODSs are 10 or more years old, and so are ready for replacement. Instead of replacing a legacy ODS with a newly designed one, consider replacing it with an integration technology. For instance, many ODSs enable a "database-oriented form of application integration." Now that enterprise application integration (EAI) technologies have reached a mature level, they are better suited for this purpose and should be considered.
Many user organizations have asked me specifically about replacing a legacy ODS with a vendor's platform for enterprise information integration (EII). When the ODS supports operational reporting on a 24-hour cycle, replacing it with EII can be a compelling solution, because EII can enable more frequent reporting cycles than the usual batch-oriented integration pro- cess that feeds an ODS.
That's easier said than done, so the next best thing would be to follow the best practices for single-subject ODSs described here. They can at least help you slow down ODS proliferation, perhaps even reverse it into a smaller number of more manageable ODSs. As with any corrective project, this can be painful. But the savings in administration and maintenance make it well worth the effort.
Philip Russom [[email protected]] is a Giga analyst at Forrester Research Inc., where he provides advice to user organizations about business intelligence, data warehousing, and data integration.