Column stores and Census data: ParAccel and SuperSTAR

ParAccel has won well-deserved attention in recent months, including Intelligent Enterprise recognition as a Company to Watch. There's irony, however, in their market positioning. It's not that column stores, most notably SybaseIQ, have been around for decades. It's that ParAccel chose to explain their product with an application, analysis of U.S. Census data, that is essentially owned by a competing column-store system, SuperSTAR from Space-Time Research.

Seth Grimes, Contributor

January 8, 2008

3 Min Read

ParAccel has won well-deserved attention in recent months, including Intelligent Enterprise recognition as a Company to Watch. They're a start-up that boasts an all-star cast of executives, positioned in a hot category, namely column-store DBMSes that are optimized for analytics. There's irony, however, in their market positioning. It's not that column stores, most notably SybaseIQ, have been around for decades. It's that ParAccel chose to explain their product with an application, analysis of U.S. Census data, that is essentially owned by a competing column-store system, SuperSTAR from Space-Time Research.

I have personal history here: I designed the U.S. Census Bureau's Census 2000 tabulation system, working on subcontract to IBM. Back in 1998, I wrapped up the selection of SuperSTAR over competing options. We chose SuperSTAR for superior performance and ease of use. I then led the development team that created a system that supported both ad-hoc queries and the production of hundreds of billions of statistical tables for subsequent publication via the Census Bureau's American FactFinder Web site.The Census Bureau and IBM chose SuperSTAR for the very reasons that Intelligent Enterprise cited in naming ParAccel a Company to watch: "column-store databases are nothing new; it's well known that they offer super scalability and blazing query response in analytic applications." Like ParAccel today, ten years ago, STR was "an upstart that's blowing by established price and performance benchmarks." STR has continued to improve the product, and the Census Bureau and IBM recently reupped for analysis of the 2010 Census using SuperSTAR. (I left the project myself in 2002 after four and one-half years.)

I haven't compared ParAccel to SuperSTAR or Vertica, SybaseIQ, MonetDB, Infobright, or other column-store DBMSes, and frankly, although my company has a business relationship with STR, I'm a consultant and I would happily work with any of them, whichever best suited the project and customer. In the case of Census data, no other product I know of has SuperSTAR's capabilities for handling important requirements such as confidentiality protection; ability to roll-up multiply branching geographic hierarchies, up to 9 levels deep at Census; support for hierarchical datasets such as the Census's, which represent data by geographic area-household-person; built-in support for "multi-response" data, at the bureau, accommodating people who identify themselves in more than one racial category; and both a GUI and also a non-graphical interface for automation of large scale, production tabulations.

The first lesson here is that a conceptually simple illustration, for instance ParAccel's Census-analysis example, may not be so simple in real life. Hot technology such as a column-store DBMS isn't enough. Applications that require high performance tend also to require other specialized capabilities. Secondly, if application of established technology seems like a good idea, there's a good chance that someone else got there first.

Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He consults on data management and analysis systems.ParAccel has won well-deserved attention in recent months, including Intelligent Enterprise recognition as a Company to Watch. There's irony, however, in their market positioning. It's not that column stores, most notably SybaseIQ, have been around for decades. It's that ParAccel chose to explain their product with an application, analysis of U.S. Census data, that is essentially owned by a competing column-store system, SuperSTAR from Space-Time Research.

Read more about:

20082008

About the Author(s)

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights