Don't Forget the Owners Manual

Ensure trouble-free data warehouse and BI deployments by answering critical planning questions.

Back-Room Operations

The second set of considerations focuses on technical systems management. Long before you go into production, you must think about a host of operational concerns. Your decisions will affect your system configuration and design. If you don't create a sound operations plan before you deploy, the data warehouse team will end up patching things together in crisis mode, which inevitably leads to crises of confidence within the user community.

The back-room operations plan should address at least the following questions:

  • How will you monitor resource usage?
  • How will you report on usage?
  • How will you automate your operations, especially the extract-transform-load (ETL) processing?
  • How will you know when your ETL processing has encountered a problem?
  • How will you notify users of data problems?
  • How will you monitor system performance?
  • How will you kill relational database and OLAP system queries?
  • How will you identify and solve performance bottlenecks?
  • How can you tune the system to prevent bottlenecks?
  • How will you ensure your system never runs out of disk space?
  • Do you need to modify your ongoing ETL process to accommodate data partitioning?

Most important, you need to plan, implement and test your backup and recovery strategy. A backup and recovery plan that's not tested regularly isn't a plan. It's a wish, and it's unrealistic of you to expect that wish to come true.

There are no easy answers to these questions. The best solutions are intertwined with your ETL system design. You can prevent a lot of problems by designing your ETL system to proactively check for conditions, such as the successful completion of an operational process, before starting work. As described in an earlier column, "The 38 Subsystems of ETL" (December 4, 2004), a robust ETL system checks for exceptions and whether the data volume and contents are reasonable. Keep track of how much disk space a daily load requires and verify that plenty of disk space is available and allocated.

Monitor Operations

Plan to monitor the operations of all the software components in your DW/BI environment: operating systems, relational databases, OLAP and data mining systems, query and reporting software and any Web portals. Developing a robust monitoring system usually requires combining features of your operating systems with features of your database engines and front-end tools. Even if you use a single vendor for your BI technology, such as Oracle or Microsoft, you should expect to combine several kinds of monitoring. Plan to create a relational database to store the monitoring performance data. You may populate the results of monitoring traces directly into the relational database, or it may make more sense to trace into files and then load those files periodically. Usually, you'll end up building a monitoring data mart to integrate the results of monitoring all the different systems into a single database.

Monitor the systems at all times. Set up the "always on" monitoring to include important events such as user logons and ETL process completions, in addition to all error events. This "always on" monitoring should periodically record statuses such as memory usage, perhaps at 15-minute intervals. Because this basic monitoring is continuous, you should justify the inclusion of every element you're tracing.

Periodically, perhaps once a month, ratchet up the monitoring to capture more events and statuses. This monthly detailed baseline can be invaluable in diagnosing performance problems, and you can often catch these problems before they're perceptible to business users. You may trace statuses at five- or 15-second intervals and keep detailed track of the processes occurring. Detailed monitoring usually affects system performance, so avoid doing it during system usage peak times. On the other hand, if you monitor the reporting system in the middle of the night, you probably won't learn anything.

These operational issues aren't sexy. You're not going to fire up the imagination of business users by talking about them. But if you don't think about these problems early on and build an operations plan into your procedures, your DW/BI system will fail. Business users won't know how to use the system and they won't know where to go to get information and assistance. Their confidence in the system, the data and your efforts will be shaken. All they'll see is the bad, and the 98 percent of the system that's good will go to waste.

Quick Study: Kimball University DW/BI Best Practices

Dos and Don'ts for DW/BI System Operations Planning

  • Do test the system and operations thoroughly before moving into production.
  • Do develop user training that's customized to your data and environment.
  • Do publish a Web site or portal with fresh information about the system, including when data was last updated.
  • Do publish information about how to access the system.
  • Do monitor the processing and query operations at all times at a nonintrusive level.
  • Do develop a procedure for killing a query (and then follow it).
  • Do ratchet up monitoring occasionally to develop a detailed performance baseline before you encounter performance problems.
  • Do develop a backup and recovery strategy.
  • Do inform business users as quickly as possible about any processing or data issues in the system, such as failed loads or (horrors!) inaccurate data.
  • Don't expect a centralized IT help desk to provide much assistance beyond troubleshooting connectivity.
  • Don't assume recovery is going to work unless you've tested it.

Required Reading

The Data Warehouse Lifecycle Toolkit by R. Kimball, L. Reeves, M. Ross and W. Thornthwaite (Wiley, 1998).

"The 38 Subsystems of ETL"

The Data Warehouse ETL Toolkit by R. Kimball and J. Caserta (Wiley, 2004).

Joy Mundy is a member of the Kimball Group. She has focused on DW/BI consulting and education for more than a decade. Write to her at [email protected].