Kimball University: Integration for Real People - InformationWeek
Software // Information Management
12:25 PM

Kimball University: Integration for Real People

These step-by-step guidelines will help dimension managers and users drill across disparate databases.

Integrating Labels

To get two databases to relate to each other, at least some of the labels in each database must have the same name and the same domain. For instance, if we have a notion of product "category" in both databases, we can use category as a point of integration if and only if the category field in the product dimension in the two databases has the same contents (as when drawn from the same domain). With this simple criterion, we are astonishingly close to achieving integration! If we issue separate but similar queries to the two databases, such as:

Database-1: select category, sum(private_measure1) from fact1 ... group by category

Database-2: select category, sum(private_measure2) from fact2 ... group by category

Then in virtually all BI tools we can sort-merge the two answer sets returned from the separate databases on the category "row header" to get a multirow results set with the column headers

category, sum_private_measure1, sum_private_measure2

It's crucial that you appreciate the importance of this simple result. We have successfully aligned measures from two separate databases on the same rows of a report and at the same level of granularity; because the category label has been carefully integrated across the two databases, at a certain level the report makes sense regardless of the value of measure1 and measure2. Even without integrating the measures across the two databases, we have achieved a powerful, valid integration result.

This method of assembling results from two databases in a single results set, often called "drill across," has some other powerful advantages. Since the queries are launched to the two databases separately, the databases can be on separate data servers, or even hosted by separate database technologies. Both databases can manage performance independently. But even in the most centralized environment, we almost always have to execute this drill across against two independent fact tables. This approach has nothing to do with dimensional modeling; regardless of your modeling persuasion, it simply isn't possible to put multiple datasets of different granularity and dimensionality into a single table.

Integrating the labels of an enterprise is a huge and important step, but it's not easy and the main challenge isn't technical.

Getting the Enterprise to Agree

Using our product category example, there is bad news and good news in making sure that "category" has the same name and the same domain in multiple databases. The bad news is that as the data warehouse architect, you must convene a meeting of all stakeholders who would like to perform drill-across queries with the affected databases. You must reach 100 percent agreement among these stakeholders as to permissible category values and how these values are assigned to all of the products.

Before you dismiss this level of agreement as impossible, consider the good news that it's not absolutely necessary for any stakeholders to change any of their cherished product labels! If the group decides that category is a "conformed attribute" (a term for special labels that support integration), then the category label can be a newly introduced field that didn't exist in anyone's version of the product dimension. Such a narrow solution is, of course, disappointing, even if integration has been achieved, because category would be the only row header that could be used in drill-across queries! Far better that many labels could be found and administered to have the same names and domains across all the integrated databases (and in practice, this is more often the case). The majority of the important dimensions are filled with conformed attributes, but various stakeholders can insist on keeping some private labels. These nonconformed attributes can only be used as row headers or constraints on queries on their "home" databases.

2 of 6
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
Annual IT Salary Report 
Base pay for IT professionals has remained flat this year with a median annual salary of $88,000 for staff and $112,000 for management. However, 58% of staff and 62% of managers who responded to our survey say they're satisfied with their compensation. Download this report to find out which positions earn the highest compensation.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll