Dimensional Relational vs. OLAP: The Final Deployment Conundrum - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management

Dimensional Relational vs. OLAP: The Final Deployment Conundrum

The choice between deploying relational tables or OLAP cubes is not a trivial matter. Weigh these 34 pros and cons of each approach early in the design of your extract-transform-load system.

Ralph Kimball Ralph Kimball

It has become fashionable to regard the final deployment step of building an extract-transform-load (ETL) system as a mere tactical choice between delivering dimensional relational tables or OLAP cubes into the end-user environment. But is this choice quite so superficial? Should we defer thinking about this choice until just before roll-out? In this article we take a hard look at this final deployment conundrum and urge you to resolve this question very early in the design process Business intelligence (BI) developers have largely accepted the premise that data is most user-friendly when delivered in a dimensional format. If you define the data warehouse as the platform for all forms of BI, then the last step of the ETL process in the data warehouse is exposing the data in a dimensional format. Many BI developers have recognized that a properly designed set of dimensional relational tables can be transformed into an OLAP cube in a virtual one-to-one mapping. For various reasons explained below, I recommend that all OLAP cubes be built directly from dimensional models. The dimension tables in such a relational schema become the OLAP cube dimensions, often referred to as the edges of the cube. The fact tables from the relational schema provide the content of the OLAP cube cells.

While there can be some small semantic differences between the relational dimensions and the OLAP cube dimensions, the profound overlap between these two approaches has made it tempting to regard the final deployment choice as a tactical maneuver executed at the very end of the data warehouse development. Worse yet, the argument is sometimes made that BI applications can be switched between relational and OLAP implementations because of this similarity.

Not so fast! Under the right circumstances, the ETL pipeline can be mostly insulated from the final deployment choice, but the relational vs. OLAP choice is multi-faceted decision with lots of issues to consider. Let’s look at the advantages and disadvantages of both choices before we jump on either bandwagon.

    Dimensional Relational Advantages

  • Relational database structures are largely vendor independent and especially dimensional structures are pretty easily ported. However, ETL command line scripts and proprietary code like PL/SQL may not be very portable.
  • All the main relational DBMS’s have high-volume bulk data loaders, and they're especially effective if you turn off transaction logging.
  • A wide variety of SQL-generating tools from many vendors are able to directly access the data. Usually these tools can be repurposed to point at a new relational DBMS.
  • SQL expertise is widely available in the marketplace, since the main features of SQL have long been standardized, and SQL is routinely taught at the college level.
  • Hand-coded SQL is generally readable, although the SQL emitted by high-end BI tools is overwhelming and cannot reasonably be changed by the BI applications development staff.
  • There are many different ways to control DBMS performance, including schema designs, indexes, aggregates and materialized views. Dimensional relational structures, because of their predictable characteristics, have well-understood performance tuning techniques, although these techniques vary somewhat by vendor.
  • Relational databases are extremely stable and are suitable for serious archiving and backup.
  • Relational database structures are not vulnerable to catastrophic invalidation, unlike OLAP cubes that may involuntarily rebuild themselves if a SCD Type 1 change is made to a dimension.
  • Database sizes are pretty much unlimited, and single dimensional fact tables in the multi-terabyte range are increasingly common.
  • High-end relational databases can join many large tables.
  • Hybrid (mixed workload) applications involving both querying and updating are usually easy to construct.

    Relational Disadvantages

  • SQL is a truly horrible language for powerful analysis and complex applications.
  • SQL is severely asymmetric: you can perform complex constraints and calculations generally only WITHIN records, not across records.
  • In spite of performance tuning capabilities, it is still too easy to lose control of performance.

    OLAP Advantages

  • Generally much better performance than relational when the cubes are designed correctly, with less need to play complex performance tuning games compared to relational.
  • Much more powerful analytic capabilities than relational. MDX, for example, has more powerful semantics for traversing ragged unbalanced hierarchies, such as organization charts, than does SQL.
  • Vendor-supplied OLAP tools for reporting and querying have historically been superior to relational tools, although I'm impressed by the steady investment relational tool vendors have made in improving ease of use and feature sets.
  • OLAP doesn't suffer from the symmetry problem that limits SQL.
  • The decision to load OLAP directly from dimensional relational tables doesn’t affect the ETL backroom very much: the deployment step occurs at the very end.
  • Certain loading scenarios can be very fast.
  • Certain vertical industries, especially financial services, have developed awesome OLAP solutions.
  • OLAP encourages more complex security scenarios, especially for ad hoc access. By comparison, it’s difficult to set up a relational database to protect detailed data (sales by sales rep), but provide more open access to summarized data (sales by region). This is especially true for ad hoc access on the relational side. Security is significantly more powerful on OLAP because of the semantics about parents and children inherent in the access languages.

    We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
    1 of 2
    Comment  | 
    Print  | 
    More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
How CIO Roles Will Change: The Future of Work
Jessica Davis, Senior Editor, Enterprise Apps,  7/1/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll