I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. OS's greatest BI/DW contribution to date has instead been in enabling developers. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, here's why —

Seth Grimes, Contributor

July 15, 2008

3 Min Read

I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. I've long seen that OS's greatest BI/DW has instead been in enabling developers to build BI into line-of-business applications and create specialized analytical tools. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, the launch of Aster nCluster supports my point.NCluster starts with PostgreSQL. According to Mayank Bawa, CEO and co-founder of Aster Data Systems, nCluster uses PostgreSQL as a data store on each node of a hardware cluster. Aster-built distributed database technology coordinates the nodes to deliver shared-nothing, parallelized database processing (MPP). According to Bawa, nCluster relies on "a series of patent-pending algorithms and processes that optimize the placement, partitioning, balancing, replication, and querying across a cluster of intelligent nodes." Bawa calls PostgreSQL "a very stable foundation/abstraction on which we build our algorithms."

PostgreSQL is, of course, a free-standing, open-source RDBMS. As I wrote back in June, a variety of organizations have taken advantage of its hyperfree, "Do with me what you will," BSD open-source license to, variously, build it up and strip it down. On the one hand, we have EnterpriseDB, whose aim seems to be to deliver better PostgreSQL than PostgreSQL.org does in the form of an enterprise-ready distribution with a set of integrated, open- and closed-source extensions. On the other, we have companies such as ParAccel, Netezza, and Greenplum that have taken those portions of the source code they need and stripped out the rest, building out from those PostgreSQL components into robust solutions for large-scale data warehousing. Those latter two companies have company in Dataupia and Truviso, and more power to 'em.

I asked Aster what differentiates nCluster from more established MPP systems such as Greenplum's, which also runs on commodity hardware. CEO Mayank Bawa replied that "nCluster is different in that it efficiently optimizes network bandwidth for distributed analytics." I can't say his elaboration was satisfying, but here's more —

If you look at the reference architectures of several alternatives, you will see that many tend to emphasize $/TB of disk (by using nodes with a large number of disks), at the expense of ... key metrics that relate to query performance and analytics. In contrast, the Aster nCluster achieves a much higher ratio of processing power and memory to disk, which is enabled by our network optimizations. With a more efficient network, we are able to spread our work across more nodes, which keeps those query performance ratios much more attractive.

Bawa pointed me for technical detail to a blog write-up by David Cheriton, an Aster investor, who leads the Distributed Systems Group at Stanford University.

Aster lists MySpace as a production customer with a 100-node cluster hosting over 100 TB of data with a terabyte of data added each day. The company claims other, not yet announced paying customers that include advertising networks, recommendation engines, and other social-networking companies.

Not every OS-reliant data warehousing vendor will succeed as a free-standing company. I guarantee we'll see vendor consolidation in the next year, even as new entrants emerge. Nonetheless, nCluster is yet more proof of the enormous value PostgreSQL — not even considering open-source MySQL, MonetDB, LucidDB, and Ingres — has to offer the data warehousing world.I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. OS's greatest BI/DW contribution to date has instead been in enabling developers. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, here's why —

About the Author(s)

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights