Managing Data Warehouse Growth

One of the biggest challenges in business intelligence and data warehousing initiatives is managing growth. Conflicting demands to support more users, deal with increased query and data complexity, and add more "right time" information have many at a crossroads. We explore technological changes that will make it easier to scale and offer advice on guiding the growth.

Getting A Handle On Growth

What should you do to prevent data warehouse requirements from surpassing your ability to manage them? You don't have to be implementing a 500 TB warehouse to encounter challenges in performance, scalability or availability. The complexity of the problems often stumps organizations more than just sheer scale. A company might have less than a terabyte of user data but face such complex business rules that no database system could easily handle the benchmarking. With just 1 TB of data and a sufficiently complex schema and workload, you'll find yourself up against some tough problems.

The first step is to understand the business drivers. Make sure your business-level requirements and schedules reflect those drives accurately. Then be sure to define those requirements by year over a strategic time frame (typically three to five years). Third, develop a set of concrete usage scenarios that are characteristic of the workload that the business requirements will demand.

With the business drivers and requirements clear, you can focus more closely on the data-warehouse issues, including database size, structure, workload and service-level agreements. Build a margin of safety into your requirements; you'll never have complete certainty about how the system will be used and what might ultimately expand those requirements (such as success!). Then it's time to evaluate the long-term and near-term trade-offs:

• In the long term, the ability to leverage data from across the enterprise pays huge dividends, and is the key to most of the success stories you hear about. Clearly, the core of your enterprise data must be integrated to provide a platform for rapid and inexpensive implementation of analytical solutions.

• In the near term, you want to identify where you can gain significant cost or performance advantages in a specific application or sandbox with a data-mart or appliance approach. Remember that some benefits may come at the price of fragmenting decision support data and incurring greater overhead for replication, ETL and other data movement and integration.

Testing your intended solution against your detailed requirements is critical before committing to it. This advice applies to everything that will be important to handle scalaby: the platform, configuration, database design and more.

Remember, previous experience is not always a good predictor of future success. And don't simply wait and hope that scalability problems will work themselves out. Take the initiative to prove to yourself--not to mention all your stakeholders within and outside the enterprise--that the solution you've chosen will scale and be strong enough to withstand the challenges of many kinds of growth.

Richard Winter, president of Wintercorp., is an expert in large database technology, architecture and implementation. He is executive editor of Scaling Up, a quarterly newsletter, and can be reached at [email protected].

Rick Burns is vice president of engineering at Wintercorp. and conducts benchmarks, platform evaluations and architecture studies for large data warehouses. He can be reached at [email protected].