Software // Information Management
Commentary
9/22/2009
02:23 PM
Curt Monash
Curt Monash
Commentary
Connect Directly
RSS
E-Mail
50%
50%
Repost This

Issues Comparing Analytic DBMS Performance

The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they're completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons...

The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they're completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons. That is, System A and System B are used to run the same set of queries, and some function is applied that takes the two sets of query running times as an input, and spits out a relative performance number as an output.

For example, Greg Rahn twittered to me that Oracle Exadata commonly outperforms existing Oracle installations by a factor of 50 or better, based on a "geometric mean". What I presume he meant by that is:

  • At any one user installation, a number of queries were compared on new system vs. old.
  • In each case, the ratio between new and old running time was taken.
  • The geometric mean of all those ratios was computed.
Math note: Reversing the order of the second and third steps doesn't change the outcome at all. Either way, you wind up multiplying N things together, dividing by the product of another N things, and taking the Nth root of all that.

Looking just at the arithmetic, a straightforward geometric-mean approach is not a terrible methodology. Theoretically, I'd prefer to just add up the running time whole workload and divide the two aggregates. But I've tested that change in a couple of cases, and it didn't seem to make a big difference. In particular, the geometric is better than the arithmetic mean, which gives huge weight to the most extreme number(s) in the set. (SAP used to do just that in marketing BI Accelerator, getting huge results because one customer once got a better than 600X speedup on one particular query out of eight or so.)

That said, there are a number of factors that can render such comparisons seriously misleading. For starters, most of these comparisons omit to consider how often each query will be run. (One advantage to my preferred approach -- add up total running time before doing any other arithmetic -- is that you can easily weight queries by frequency.) Beyond that, especially when a new challenger system is compared to an old incumbent:

  • Comparisons aren't always run on identical or comparable hardware.
  • One system might be running an obsolete release of the software.
  • Queries may not be equally well written and tuned on both systems.*
  • Both systems may not be equally well administered and/or configured.
In addition:
  • A small set of queries may not be representative of a whole workload.
  • Since complete workloads are rarely tested, concurrency and contention may not be properly modeled.
And last but not least:
  • Marketing departments tend to take their best results and claim those are "typical."
*I suspect that some of the most dramatic speed-ups we see are for queries that are just plain badly written. On the other hand -- if you've been running your data warehouse software for a few years and still haven't figured out how to write your queries for decent performance, maybe it's somewhat too hard to use...

Related links

The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they're completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons...

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.