Of 'Elephants,' Column-Store Databases and the Von Neumann Architecture - InformationWeek
IoT
IoT
Software // Information Management
Commentary
11/5/2008
09:13 AM
Rajan Chandras
Rajan Chandras
Commentary
50%
50%
RELATED EVENTS
[Dark Reading Crash Course] Finding & Fixing Application Security Vulnerabilitie
Sep 14, 2017
Hear from a top applications security expert as he discusses key practices for scanning and securi ...Read More>>

Of 'Elephants,' Column-Store Databases and the Von Neumann Architecture

Listening to Dr. Michael Stonebraker extol the virtues of column-store databases... it's becoming clear that a new data storage architecture is the need of the day... Stonebraker also seemed to imply that column-store databases are wonderful not just for data warehouses, they are pretty good for conventional (transactional) uses as well. That, of course, doesn't seem right...

Listening in to Dr. Michael Stonebraker decry "elephants" and extol the virtues of column-store databases in general and Vertica in particular, it's becoming clear that a totally new data storage architecture is the need of the day.

Dr. Stonebraker is, of course, a venerable figure in the world of databases, best known for his pioneering work on Ingres at UC Berkeley more than a quarter century ago. These days, however, in his role as CTO of Vertica, he is constrained to speak more or less unilaterally on the topic. In a recent presentation on Vertica, Dr. Stonebraker didn't actually call the leading relational database vendors - Oracle, IBM, Microsoft - "large, lumbering and slow." He did, however, repeatedly refer to them as "elephants." Very clever.You probably know of column-store databases and about Vertica, so I won't go into too many details here - IntelligentEnterprise.com has plenty of information to offer (check this update, this trend article and this blog).

Here's what's interesting. Towards the end of the presentation, I thought I heard Dr. Stonebraker clearly state/imply that column-store databases are wonderful not just for data warehouses, they are pretty good for conventional (transactional) uses as well. That, of course, doesn't seem right. The central premise of all conventional relational databases is to store the entire row on a single database "page," as far as possible, which makes for efficient storage and retrieval of a single row of data (i.e. a single tuple or entity instance) - thus making it efficient for systems that read or write transactions (one transaction typically deals with a single entity instance - for example, one customer order, one invoice, or for that matter, a single customer). Hence, careful planning around row size and page size is a key component of database design optimization.

This strength of conventional databases, when used for large, star-join sorts of queries, also turns into a weakness, since the typical data warehouse query only needs to look at a few columns and not the entire row of data (specifically, the columns in the SELECT and WHERE clauses). That's where column-store databases get their strength: because they store data by the column, the page now has a single column of data, organized in (whatever) sorting order. Queries now need to read less number of pages to get all the values, and sorting and matching is faster.

Consider what happens when we use a column-store database and read a single transaction - say, that customer master record or the customer order. This data is now spread across many pages, and reading the transaction suddenly becomes much less efficient. Now imagine a large-scale OLTP system. It's not clear how column-store databases will cater to this need. Conventional or column-stored representation - there's no getting away from the Yin and Yang of database organization.

This reminded me (rather laterally) of the Von Neumann single-instruction-single-data (SISD) bottleneck. How fast can you process data if you are constrained to operate each instruction on a single piece of data sequentially? Subsequent architectures, such as vector processing (SIMD) and parallel processing (MIMD, whether small-scale clustering or large-scale parallelism) got around the bottleneck by a fundamental shift in paradigm.

Similarly, we need an equally fundamental shift in database storage architecture that will take us past two critical bottlenecks in database organization and performance that exist today:

  • Differences in performance between, say, row-order and column-order databases
  • the need to physically replicate data merely in order to use it in two different situations

This is interesting and highly pertinent stuff. Stay tuned for more in the future. Your own insight is also invited.Listening to Dr. Michael Stonebraker extol the virtues of column-store databases... it's becoming clear that a new data storage architecture is the need of the day... Stonebraker also seemed to imply that column-store databases are wonderful not just for data warehouses, they are pretty good for conventional (transactional) uses as well. That, of course, doesn't seem right...

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll