PostgreSQL is, of course, a free-standing, open-source RDBMS. As I wrote back in June, a variety of organizations have taken advantage of its hyperfree, "Do with me what you will," BSD open-source license to, variously, build it up and strip it down. On the one hand, we have EnterpriseDB, whose aim seems to be to deliver better PostgreSQL than PostgreSQL.org does in the form of an enterprise-ready distribution with a set of integrated, open- and closed-source extensions. On the other, we have companies such as ParAccel, Netezza, and Greenplum that have taken those portions of the source code they need and stripped out the rest, building out from those PostgreSQL components into robust solutions for large-scale data warehousing. Those latter two companies have company in Dataupia and Truviso, and more power to 'em.
I asked Aster what differentiates nCluster from more established MPP systems such as Greenplum's, which also runs on commodity hardware. CEO Mayank Bawa replied that "nCluster is different in that it efficiently optimizes network bandwidth for distributed analytics." I can't say his elaboration was satisfying, but here's more —
If you look at the reference architectures of several alternatives, you will see that many tend to emphasize $/TB of disk (by using nodes with a large number of disks), at the expense of ... key metrics that relate to query performance and analytics. In contrast, the Aster nCluster achieves a much higher ratio of processing power and memory to disk, which is enabled by our network optimizations. With a more efficient network, we are able to spread our work across more nodes, which keeps those query performance ratios much more attractive.
Aster lists MySpace as a production customer with a 100-node cluster hosting over 100 TB of data with a terabyte of data added each day. The company claims other, not yet announced paying customers that include advertising networks, recommendation engines, and other social-networking companies.
Not every OS-reliant data warehousing vendor will succeed as a free-standing company. I guarantee we'll see vendor consolidation in the next year, even as new entrants emerge. Nonetheless, nCluster is yet more proof of the enormous value PostgreSQL — not even considering open-source MySQL, MonetDB, LucidDB, and Ingres — has to offer the data warehousing world.I've written about the "category error" of looking at open source primarily as targeting end-user replacement of BI applications and established data warehouse platforms. OS's greatest BI/DW contribution to date has instead been in enabling developers. I'm more convinced than ever of this assessment, even as OS-BI vendors have launched improvements that target enterprise end users. On the DW front, here's why —