Hardware & Infrastructure
News
2/21/2008
06:15 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Database Pioneer Rethinks The Best Way To Organize Data

Vertica Systems' Stonebraker says "column" design can speed data warehouse performance.

Is there a better way to build a data warehouse? For years, relational databases--which organize data in tables composed of vertical columns and horizontal rows--have served as the foundation of data warehouses. Now database pioneer Michael Stonebraker is promoting a different way of organizing them, promising much faster response times.

InformationWeek Reports

DB2, MySQL, Oracle, SQL Server, Sybase, and Teradata all have rows as their central design point. Yet in data warehousing, faster performance is gained through a column layout, maintains Stonebraker, CTO of startup Vertica Systems, which last week introduced its Vertica 2.0 column-based database, due for general release in March. Stonebraker says all types of queries on "most data warehouses" will run up to 50 times faster in a column database. The bigger the data warehouse, the greater the performance gain.

While a scientist at the University of California at Berkeley in the 1970s, Stonebraker was one of the original architects of the Ingres relational database, which spawned several commercial variants. A row-based system like Ingres is great for executing transactions, but a column-oriented system is a more natural fit for data warehouses, Stonebraker now says.

Stonebraker finds speed in columns

Stonebraker finds speed in columns

Why? Data warehouses frequently store transactional data, with each transaction having many parts. Columns cut across transactions and store an element of information that's standard to each transaction, such as customer name, address, or purchase amount.

A row, by comparison, may hold 20 to 200 different elements of a transaction. A standard relational database would retrieve all the rows reflecting, say, sales for a month, load the data into system memory, then find all sales records and generate an average from them. Being able to focus on just the "sales" column leads to improved query performance.

There's a second performance benefit in the column approach. Because columns contain similar information from each transaction, it's possible to derive a compression scheme for the data type, then apply it throughout the column. Rows can't be compressed as easily because the nature of the data--name, ZIP code, and account balance, for example--varies from record to record. Each row would require a different compression scheme.

Compressing data in columns makes for faster storage and retrieval and reduces the amount of disk required. "In every data warehouse I see, compression is a good thing," Stonebraker says.

TEST DRIVE

Twenty customers have begun using Vertica's database since it was released in September, says CEO Ralph Breslauer. One of them is Sonian Networks, a 10-employee company that archives e-mail for other businesses. Sonian CTO Greg Arnette says he considered PostgreSQL, MySQL, and Oracle for business intelligence but opted for Vertica, based in part on "Stonebraker's reputation."

Arnette expects Sonian's e-mail archive to grow from a few terabytes to a petabyte sometime next year. "We needed a system that was optimized for high-volume reporting," Arnette says. In tests conducted by Sonian, Vertica was "a lot faster" than MySQL and required less CPU power, he says.

Sonian's e-mail archive and Vertica data warehouse are hosted on Amazon.com's Simple Storage Service and computing service. Sonian's business customers are able to access reports generated by its data warehouse.

IBM VP Anant Jhingran, in a blog posting last fall, called the column database concept "academically interesting," but suggested that APIs and integration issues stand in the way of widespread adoption. But, with $23.5 million in venture capital funding, Vertica has grown to 50 employees. Says Stonebraker, "I expect the data warehouse market to become completely column-store based."

Comment  | 
Print  | 
More Insights
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 16, 2014.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.