These step-by-step guidelines will help dimension managers and users drill across disparate databases.
To get two databases to relate to each other, at least some of the labels in each database must have the same name and the same domain. For instance, if we have a notion of product "category" in both databases, we can use category as a point of integration if and only if the category field in the product dimension in the two databases has the same contents (as when drawn from the same domain). With this simple criterion, we are astonishingly close to achieving integration! If we issue separate but similar queries to the two databases, such as:
Database-1: select category, sum(private_measure1) from fact1 ... group by category
Database-2: select category, sum(private_measure2) from fact2 ... group by category
Then in virtually all BI tools we can sort-merge the two answer sets returned from the separate databases on the category "row header" to get a multirow results set with the column headers
It's crucial that you appreciate the importance of this simple result. We have successfully aligned measures from two separate databases on the same rows of a report and at the same level of granularity; because the category label has been carefully integrated across the two databases, at a certain level the report makes sense regardless of the value of measure1 and measure2. Even without integrating the measures across the two databases, we have achieved a powerful, valid integration result.
This method of assembling results from two databases in a single results set, often called "drill across," has some other powerful advantages. Since the queries are launched to the two databases separately, the databases can be on separate data servers, or even hosted by separate database technologies. Both databases can manage performance independently. But even in the most centralized environment, we almost always have to execute this drill across against two independent fact tables. This approach has nothing to do with dimensional modeling; regardless of your modeling persuasion, it simply isn't possible to put multiple datasets of different granularity and dimensionality into a single table.
Integrating the labels of an enterprise is a huge and important step, but it's not easy and the main challenge isn't technical.
Getting the Enterprise to Agree
Using our product category example, there is bad news and good news in making sure that "category" has the same name and the same domain in multiple databases. The bad news is that as the data warehouse architect, you must convene a meeting of all stakeholders who would like to perform drill-across queries with the affected databases. You must reach 100 percent agreement among these stakeholders as to permissible category values and how these values are assigned to all of the products.
Before you dismiss this level of agreement as impossible, consider the good news that it's not absolutely necessary for any stakeholders to change any of their cherished product labels! If the group decides that category is a "conformed attribute" (a term for special labels that support integration), then the category label can be a newly introduced field that didn't exist in anyone's version of the product dimension. Such a narrow solution is, of course, disappointing, even if integration has been achieved, because category would be the only row header that could be used in drill-across queries! Far better that many labels could be found and administered to have the same names and domains across all the integrated databases (and in practice, this is more often the case). The majority of the important dimensions are filled with conformed attributes, but various stakeholders can insist on keeping some private labels. These nonconformed attributes can only be used as row headers or constraints on queries on their "home" databases.
The Agile ArchiveWhen it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyIT’s tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.