The Engine of Intelligence

The push to reduce latency changes the nature of database performance to support the data warehouse.

InformationWeek Staff, Contributor

April 2, 2004

8 Min Read
InformationWeek logo in a gray background | InformationWeek

Optimal database performance is perhaps the biggest challenge managers and administrators face as they work to ensure delivery of BI to data warehouse (DW) users. No one likes to wait for answers. Squeezing every last bit of performance out of the hardware and software to make those DW queries fly is more important than ever. This is especially the case as businesses increasingly manage BI systems as mission-critical applications and move toward "real-time" DW for some apps.

Out, Evil Latency!

As businesses push to reduce the data latency between DW and operational systems, the DW begins to look more like a transactional system. For a DW to deliver near real-time information, you either have to update it more frequently or access data directly from operational systems. Either way, the push to reduce latency changes the nature of database performance to support the DW. DWs were created to separate resource-intensive analytic processing from shorter duration, frequent transaction processing. If the two worlds now come back together, the churn pressure on the database system will be significant.

But even if the same data is accessed for both transactional and analytic apps, real-time goals will force different performance metrics on DW databases. Database metrics evaluate transactions based on speed; that is, their ability to "get in," "do their work," and "get back out again." DW queries can't be measured in the same way. The analytic nature of BI means that DW queries perform large and sometimes complex operations to arrive at heretofore-unknown correlations in the data. Speed isn't as important as accuracy. A better metric for such queries is repeatability; that is, does the same query against the same data at the same time of day result in similar performance characteristics? Managing performance in this way requires a different mindset.

Another challenge is that many DW queries must access aggregated data. With a separate DW environment, DBAs can denormalize physical database structures to create aggregate tables containing redundant data summarized from other data. Aggregate tables are supposed to optimize performance and increase data availability — both noble goals. However, these tables add to the size and the complexity of managing the environment.

Modern database systems are beginning to support aggregation in a more easy-to-use, performance-oriented manner. DBAs can create materialized query tables (MQTs) and materialized views that essentially turn a query into a physical table. But MQTs are automated and aware; when DBAs use MQTs, queries accessing the base tables don't need to be rewritten. Instead, the database optimizer understands the relationship between the MQT and the "real" tables accessed to create it. The DBMS itself decides when and how to use the MQTs versus the base tables. Automated refresh and management commands simplify MQT implementation. MQTs bolster a DBA's ability to deliver a real-time DW because they lead to better performance against normalized database implementations.

IBM's DB2 uses the MQT term (previously known as "automated summary tables") to refer to this functionality. Oracle offers similar functionality under the name "materialized views" (previously known as "snapshots"). Microsoft SQL Server also uses the term "materialized view." The IBM term, though, is a bit more accurate because a view is virtual, whereas a table is physical. Each of these DBMSs provides varying degrees of support for query rewrite as well.

Performance Fundamentals

Every database application, at its core, requires three components: the system, the database, and the application. This is true not just for OLTP but also for DW. To improve performance, the administrator must be able to monitor and tune each of these components — something that's easier said than done.

The system consists of the system software and hardware required for the application to provide service. This includes the computer itself, its disk subsystems, network connections, and all peripherals. From a software perspective the system includes the operating system, the file system, the DBMS, networking protocols, and any related middleware, such as transaction processors or message queues.

To deliver system performance, the DBA must have the resources to monitor, manage, and optimize the performance of these disparate pieces of hardware and software. Some of the tasks required for system tuning include properly allocating and managing memory structures (such as buffer pools and program cache area), managing storage, integrating the DBMS with other system software, properly using database logs, and coordinating the DBMS's operating system resources. Additionally, DBAs must control the installation, configuration, and migration of the DBMS software. If the system isn't performing properly, everything that uses the system will perform poorly. In other words, a poorly performing system affects every database application.

Memory structures, specifically data buffers, are probably the most important aspect of DW system performance. For the majority of organizations that run DW queries via standard reporting software, the database program cache hasn't been very important. However, at organizations where DW relies on many complex individual programs to deliver BI, the program cache is very important.

Indeed, every DBMS works best when it uses memory efficiently: this is where autonomic tuning is proving to be beneficial. By allowing the system to expand, reallocate, and adjust memory across multiple database buffers, shops can dramatically improve performance, especially for systems with unpredictable and variable workloads, the very qualities that characterize DW queries. Modern database systems and performance solutions offer built-in intelligence enabling the system to manage itself.

The second component of database performance is the database itself and the structures of which it is composed. The database stores the data used by applications. When the application needs to access data, it does so through the DBMS to the database of choice. If the database isn't optimally organized or stored, the data it contains will be difficult or slow to access. The performance of every application that requires this data will be negatively affected.

Over time, as data is modified and updated, the DBMS may have to move it around within the database. Such activity causes the data to become fragmented and inefficiently ordered. The longer the database remains online and the more changes made to the data, the more inefficient database access can become. To overcome disorganized and fragmented databases, DBAs can run a reorganization utility to refresh the data and make the database efficient once again. But the key to successful reorganization is to reorganize only when the database requires it; instead, some companies over-reorganize by scheduling regular database reorganization jobs, whether or not the database is fragmented. Overreorganizing wastes valuable CPU cycles.

Modern database systems and tools provide autonomic methods to discover and correct poorly organized databases. Reorganization is becoming more automated with real-time accumulation of database statistics accompanied by intelligent agents that understand the meaning of those statistics and what to do when the statistics aren't optimal. Of course, the system must be aware of your unique environment; for example, you don't want an automatic reorganization to start up in the middle of your most important user's long-running analytic query.

But reorganization is only one of many database performance tasks required. Others include data set placement, partitioning for parallel access, managing free space, and assuring optimal compression.

The third and final component of database performance focuses on the application itself. Indeed, as much as 80% of all database performance problems are caused by inefficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded.

SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Every DBMS provides a method of inspecting the actual access paths that will be used to satisfy SQL requests. DBAs must be experts at understanding the different types of access paths, as well as which ones are best in which situation. Furthermore, DBAs must be able to interpret the output of the access path explanation produced by the DBMS, since it is often encoded and cryptic.

Host language code refers to the application programs written in C, Cobol, Java, Visual Basic, or the programming language du jour. It's quite possible to have finely tuned SQL embedded inside of inefficient host language code. And, of course, that would cause a performance problem.

Modern database systems and tools are adapting to solve these issues as well. SQL tuning software that translates cryptic plan table entries into English, combined with proactive advice for modifying the SQL for performance helps guide developers and end users to build efficient SQL. Also, DBMS optimizers are becoming more intelligent and efficient, sometimes completely rewriting SQL internally to improve performance with no user intervention required.

The Bottom Line

As DWs evolve to meet real-time demands, how we manage database performance must also evolve, expand, and become more agile. Your organization will need administrators who understand not only the basics of database performance, but also the implications of real-time DW on performance management. Finally, by taking advantage of the autonomic features of DBMS and systems management software, you can simplify and streamline the performance of your DW systems.

Craig S. Mullins is a director of technology planning for BMC Software. He has extensive experience in database management. He is author of two books: Database Administration: The Complete Guide to Practices and Procedures (Addison-Wesley, 2002) and DB2 Developer's Guide, 5th edition (Pearson Education, 2004). You can contact him via his Web site at www.craigsmullins.com.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights