Database guru Michael Stonebraker recently touted the benefits of column-oriented database as offered by his start-up, Vertica. But most major database vendors are ignoring the idea, or in IBM's case, actively pooh poohing it.
However, Sybase, the also-ran in the database field, is proving you don't have to be a start-up to capitalize on a new approach.
Sybase revenues grew to $1.026 billion in 2007, not just on the strength of its standard relational systems, Sybase Advanced Server Enterprise, nor on the strength of its sales of mobile device databases, a small market segment that it leads. Rather, its hidden strength is a column-oriented database for data warehouse systems, Sybase IQ, which has existed for 10 years and seemed to catch fire in 2007.
"Sybase IQ revenues were up 70% last year," said Richard Pledereder, VP of engineering. He thinks that's because the column approach yields better query performance, and "we've been doing this for quite some time -- since the mid-1990s."
And "if you look at Michael Stonebraker's papers on column-oriented systems, they frequently refer to Sybase IQ," he said. Sybase now claims 1,200 Sybase IQ customers. It runs large data warehouses powered by big, multiprocessor servers. Priced at $45,000 per CPU, those IQ customers now account for a significant share of Sybase's revenues, although the company won't break down revenues by market segment.
Scott Smith, director of data warehousing at ComScore, an analysis service for evaluating visitor activity on Web sites, said Sybase has "done a good job of putting a front end" on it IQ column-oriented system to make it look and feel much like a traditional relational database. "Their goal was to not freak people out," he said in an interview.
A column-oriented database handles SQL queries without modification, but the database administrator has to think differently about the data. He has to be column-oriented himself and think in terms of collections of similar records derived across sets of transactions, instead of individual transactions themselves.
"You have to know your data a little better. You have think differently from the indexing perspective," Scott said, since the index will be more by a subject, such as "sales amount," rather than a transaction with date, customer name, and zip code as well as sales amount. Scott calls it paying attention to the "cardinality" of the data."
Scott has been using IQ for both a 10 Tbyte and a 29-35 Tbyte data warehouse at ComScore for 7.5 years, but he doesn't rely on Sybase as his traditional relational database vendor. For that, he turns to Microsoft's SQL Server. But Microsoft has no column-oriented offering, nor known research on column oriented databases that might lead to a product. Microsoft declined to comment on the pros and cons of column-oriented systems, as did Oracle.
IBM's Anant Jhingran, VP and CTO of IBM's Information Management unit, which includes DB2, has publicly disputed Stonebraker's assertions that column-oriented leads to better performance in data warehousing. That's ironic because Stonebraker was Jhingran's academic advisor while a PhD student at the University of California at Berkeley.
"I would love to agree with him," he began cautiously, "because I really like Streambase," referring to another Stonebraker start-up. Streambase produces software that does complex event processing.
But Jhingran doesn't agree on column-oriented. "Query performance, while interesting, is just 20% of the story," he said. "Our clients say, 'My god, is 20% performance worth sacrificing all the other gains'" of traditional database systems. They include back-end connectivity to applications and other sources of data, the enterprise's existing investment in trained database administrators, and the ability to move relational data around freely in predictable ways.
By the time all factors are considered, the column-oriented system's performance gain amounts to just 10%, he asserted. "How much complexity is justified to get a 10% improvement?" he asked.
Stonebraker asserts that a 50X improvement in query performance is possible and worth the gain in hardworking data warehouse systems.
"When you have billion row tables because of the amount of data, there's a flow problem," said data warehouse administrator Smith at ComScore. "The amount of data that needs to be streamed is more like a fire hose. IQ is the only thing that can keep up."
Stonebraker acknowledged that Sybase was one of the first such commercial systems. But he said Vertica has come up with its own column-oriented design and is trying to optimize the gains that can be realized through compression and auxiliary approached to the design.
Vertica is based on an open source column-oriented system, C-Store. MonetDB is also a column-oriented open source project that claims a 10X improvement in SQL query and XQuery for XML data speeds.
Netezza, the maker of a data warehouse appliance, and ParAccel, supplier of a data warehouse system, also use column-oriented database in their products.