7 Gotchas That Wreck Data Warehouse Scalability

Warnings from data warehouse consultant Richard Winter.
1. Wait until the system is built to test for scalability
There's always a temptation to wait too long before doing performance and scalability testing. The classic trap is waiting until the system is ready to go into production. Sure, the test is realistic because you can run the actual database and application, but if you discover something wrong, it's often too late to do anything about it. Test for scalability before you're committed.

2. Live with vague expectations
Database people think that if no requirements are established, no one can prove they failed. In reality, it's often worse in this situation; management assumes that the system will meet all expectations, so the system is never good enough. You're much better off setting realistic expectations that can be met.

3. Skip requirements
Users often don't know what the requirements are. You have to help them visualize a new business process and the requirements for supporting it. Only then can you develop valid usage scenarios and engineering requirements.

If, for example, you currently mail a giant catalog to all customers quarterly, and you want instead to do 100 targeted mailings of specialty catalogs each going to about 2% of your customers, then hold two facilitated discussions with stakeholders. First, talk about what the new mailing process will be and how it will get carried out 25 times as often each year. Second, explore the information capabilities needed to support the process. Then work out data warehouse usage scenarios and develop the necessary workloads, service levels, and other requirements. Don't skip identifying requirements, or you'll end up back at pitfall No. 2.

4. Skip risk analysis
Once you develop requirements, identify, test, and manage the risks that emerge.

5. Accept flimsy "proofs"
Beware of salespeople taking over the definition of the proof. Never let the vendor define the test to be performed. If you don't have the expertise in-house, get a consultant with experience defining benchmark specifications for testing complex data management systems. Your test has to capture the key challenges of scale and performance.

6. Underestimate growth rates Knowing this year's requirements isn't enough.
Architectural and platform decisions will take awhile to implement and longer to change. Project requirements out two to three years, at a minimum--better to have a projection that gets revised than shoot in the dark. And don't assume that the data growth rate is the same as the business growth rate. Data and workloads tend to grow faster than the related businesses because data gets used more intensively as the business gains momentum.

7. Ignore any dimension of scalability
Data size is the dimension easiest to measure, but the workload, data complexity, query complexity, availability, and data latency dimensions are nearly as important. They can all drive configuration size and determine whether you're on the right platform. Take them all into account.

Illustration by Sek Leung

Return to the story:
Scaling The Data Warehouse

Continue to the sidebars:
Microsoft And Oracle Are Scaling Out
EBay Turns To Analytics As A Service