For those who don't know of Stonebraker, here's a short bio. He was the main architect of the Ingres relational DBMS and the object-relational Postgres DBMS, both of which were developed at the University of California at Berkeley where Stonebraker was a computer science professor for 25 years. More recently at MIT, Stonebraker was a co-architect of the Aurora stream processing engine as well as the C-Store high-performance read-oriented database engine. He is the founder of four startups that have commercialized these prototypes: Ingres Corp., Illustra (acquired by Informix before the latter was acquired by IBM), StreamBase and, most recently, Vertica.Vertica is not alone in taking the column-store approach; it's also the basis for Sybase IQ and focused solutions from players such as Sand, KX Systems and Alterian. Competition using the conventional row-store approach includes Netezza and Teradata, both of which achieve high scalability and performance by exploit warehouse-optimized, proprietary hardware. By far the biggest competitor, however, is Oracle and what Stonebraker calls "the other Elephants offering 30-year-old technology."
All roads lead to the column-store approach, he says, because nearly all data warehouses are growing faster than disk prices are dropping. "Database query times increase as a square of the database size, and there's an incessant desire to shore more data into every warehouse," he explains. "As a result, warehouse problems are getting harder, not easier, and the pain is apparent on the face of every warehouse administrator."
As warehouses built on the likes of SQL Server, mySQL and Postgres move into the terabyte range, organizations face "a fork-lift upgrade into Oracle/Sun," he says, and when that deployment "runs out of gas at 5 to 10 terabytes, they're forced to consider Netezza or Teradata."
Many very-large warehouses run just fine on conventional technology if they're only dealing with predictable reporting requirements, says Stonebraker, but as query complexity grows and as ad hoc demands multiply, things start to grind to a halt. He offers the example of a large retailer trying to provision stores in Florida during the hurricane season.
"You want to determine sales by department the week before and the week after a storm in affected areas and compare that with stores in Georgia so you can prepare for the next storm," he says. "That might take hours, or often as not, a DBA might to refuse to run that sort of query because it's too computationally intensive for the resources available."
What difference does the column-store approach make? While row-store technology would be forced to compute across all, say, 50 columns in a fact table to handle the Florida hurricane query example above, a column-store engine would pick out only those facts/columns relevant to the query - "maybe three or four" - and it could "beat any row-store technology by a factor of 50," he says.
Stonebraker makes a convincing case, but keep in mind that he's speaking as the CTO and co-founder of a vendor (who just happens to be an industry legend). When it comes to promoting, he's as good as any chief marketing officer I've met. Keep in mind, too, that Vertica just started shipping its software early this year. There are already some 50 customers, he claims, but JP Morgan Chase is the only customer that can be named (although with no detail available on what they're attempting to do or what they've achieved).
To fill you in some of the product details, Vertica runs on Linux on any grid/blade system (HP, Dell, IBM, etc.). Pricing is based on the capacity of the system starting at roughly $100,000 for the first terabyte (for the software only) and declining with each additional terabyte.
Given that Vertica has raised nearly $25 million in financing, I expect them to be a contender, soon to be mentioned in the same breath with the likes of DatAllergo, Netezza, Sybase IQ, HP and Teradata.I had a long briefing with database legend Michael Stonebraker today, and I feel compelled to share a few highlights of the conversation. Stonebraker is known as a visionary, and he has consistently turned those visions into long-term bets through commercial startups. Today's prediction? "Sooner or later, the entire data warehousing market is going to move to column-store solutions," Stonebraker asserts, column-store being the architectural basis of his latest venture, a startup called Vertica.