3 min read

Reinvent The Data Warehouse With Column-Store Databases And Appliances

These approaches beat out conventional databases in price and performance. Here's how to find products that fit with your company's data analysis needs.

When evaluating an appliance or column-store database, consider whether you're replacing the EDW or off-loading complex, data-intensive analytic queries to improve performance, delay an enterprise data warehouse upgrade, or both. If it's the former, keep in mind that column-store products aren't suitable for row-intensive queries involving many attributes. Does your EDW have to support thousands of users with a mix of query types? Not all appliances can do that. Teradata's flagship product, IBM's InfoSphere Balanced Warehouse E-Class, Hewlett-Packard's Neoview, DATAllegro v3, and Greenplum's G3 are all technically suitable for EDW deployment, though HP, DATAllegro, and Greenplum have to use partners to match Teradata's or IBM's data integration and information management software.

Appliances and column-store databases are great for analytic data marts and focused warehouses tackling complex queries and extreme data volumes. Case in point, NYSE Euronext, the parent company of the New York Stock Exchange, has replaced three aging 100-TB warehouses built on Oracle databases with three Netezza Performance Servers. Complex queries that took as long as 26 hours in the conventional warehouse now take two and a half minutes, while simple queries that took seven minutes now take five seconds, says Steve Hirsch, chief data officer.

However, these aren't typical EDWs. Loads are limited to hundreds of queries per day, he says, and the user community tops out at 150 per appliance, with only 20 concurrent users on a device. One of the broadest Netezza deployments has more than 600 users and 20,000 queries per day, Netezza says, but it's not unusual for large EDWs to have tens of thousands of concurrent users and hundreds of thousands of queries a day.

While not massive in terms of its user base, Euronext does do complex analytic calculations. "It's very possible that we could hit 40 to 50 TB of data in a single query," Hirsch says.

Once you understand your data warehousing needs, including the number of users and the depth, diversity, and volume of queries, you'll be ready to wade through all the confusing vendor claims. And when you get down to the finalists, insist on a proof-of-concept deployment.

"Don't trust the FUD or the TCP-H benchmark statistics or anything else," Gartner analyst Donald Feinberg says. "You need to put your data out there and run your queries and see which alternative is faster in your application." That's the only way to know for sure that you've made the right choice.