Data Warehouse Appliances Serve Up Information by the Gulp

Promising low cost and fast deployment, data warehouse appliances are increasingly popular, but they're not right for every business. Here's a look inside this emerging class of products, their architecture and their suitability to a range of application and workload requirements.

Key technical factors

The diagram below plots several data warehouse appliances on a simplified scale of hardware, software and storage integration that ranges from hardware, software and storage independence, at one extreme, to fully integrated "black box," at the other extreme. The scale is not absolute, but rather is intended to illustrate the positioning of each vendor's approach to hardware, software and storage integration relative to competitors.

Optimization of disk I/O is a complex topic that database vendors, and especially relational database vendors, have been researching for years. Some of the techniques developed include data caching, clustering algorithms for laying data on disk, partitioning and various forms of indexing. But the real secret to quantum leaps in the performance of data warehouse appliances lies in techniques that are impossible or impractical in a transactional RDBMS – forced sequential I/O and chip-level optimizations. Disk I/O optimizations employed by four leading data warehouse appliances are summarized in the table at right.

Recommendations and Future Trends

One of the benefits of this disruptive technology is that it has shaken up the marketplace, and even established vendors are adjusting their pricing to compete in the new landscape. Appliance vendors aren't just taking a larger piece of the pie, they are making the pie bigger. Lower costs will make large data warehouses affordable for more organizations, providing a quicker ROI, and enabling more and more users to get access to the heretofore restricted BI environment.

Even with a small market share (so far), the smaller start-up vendors have demonstrated that there's a market for preconfigured systems built with commodity-class hardware and open-source databases. This has not gone unnoticed. IBM has built its own appliance and last week it announced more hardware-based products and new software-only options. Oracle has joined with hardware partners to create "approved configurations," which can lower implementation costs. Teradata has always had an appliance-style, preconfigured system. Whether appliances will take over the market is unclear, but they've brought important focus to the "physics" of data warehouse systems, and as a result, all users will benefit.

Customers need to understand their workloads exceptionally well to make the right pick. Is the emphasis on enterprise reporting, data mining, tactical BI, continuous data loading, ad hoc queries? If it's a specific business function with its own data mart, the choice is simpler.

Enterprise data warehouses usually require systems that can handle mixed workloads or different combinations of these functions, so it's essential to choose hardware and software that are appropriately configured and balanced for the required tasks, workloads and workload mix. Conventional data warehouse and DBMS systems are highly configurable, so organizations can tune them for their own unique requirements and situation.

Appliances are well suited for large autonomous analytic applications that require little integration with the rest of the enterprise. The fit is even better when the application involves intense ad hoc queries that would put an extreme load on a conventional enterprise data warehouse. Furthermore, this kind of isolated analytic application can be a low-risk project for trying out a data appliance before deciding whether to use appliances more broadly. Data warehouse appliances also work well when projects call for fast deployment, low cost per terabyte and minimal system integration and administration. In these cases, appliances can offer remarkable cost savings and striking performance advantages.

Vendors will continue to improve appliance performance, enhancing their ability to handle mixed workloads, advancing algorithms, and continuing to scale up for ever-increasing data volumes. Some may explore pre-packaged applications to address specific types of analyses. All these features will expand the applicability and usefulness of appliances, and performance and scalability may trump cost as a primary motivation for choosing an appliance.

As in any new technology field, there are multiple small players as well as established companies. Not all will survive, so look at the company's financial attributes as well as its technology.

It's clear that data warehouse appliances are a force to be reckoned with; they can provide unparalleled performance at a low cost, thereby helping organizations become more competitive and more successful.

Brenda Castiel is a manager in Portfolio Development at EDS and is responsible for the firm's business intelligence offering.

Dan Brown is the Lead BI Technologist in EDS' Portfolio Development organization as is responsible for the BI offering's technical content.