Given the strengths of open source database management systems (DBMSes), open source seems like a natural platform for data warehousing. We've seen a number of success stories over the last few years, Travelocity, O'Reilly, FTD, and Frontier Airlines among them, but the roster of case studies is mighty thin. But I've only recently (re-)started looking — in the last couple of years, on the open-source front I've covered mostly BI (e.g., May, March) — and I hope to find many more for a report I am planning on open source (based) data warehousing (OSDW).Additional OSDW implementations are surely out there. My confidence stems from industry developments that include, since I last looked closely at the OSDW topic:
- The open sourcing of the venerable Ingres DBMS by Computer Associates in 2005 and their subsequent spin-off of Ingres as an independent company;
- The emergence of a number of DW/BI appliance deals that rely on an open source DBMS, notably Ingres-DATAllegro and PostgreSQL-based Greenplum with Sun Microsystems. (Netezza's DBMS was originally PostgreSQL but the company has moved away from open source.);
- Last summer's entry by Oracle-compatible EnterpriseDB to the DW space, supported by enhancements such as shared-nothing distributed-database architecture, developed as an answer to Oracle Real Application Clusters (RAC) according to CTO Bob Zurek, who pointed me to this case study claiming extreme savings in FTD's move to PostgreSQL-based EnterpriseDB; and
- MySQL's maturation as a DW platform, for instance with the addition of partitioning and strengthened clustering in the forthcoming 5.1 release.
But I'd like to hear from actual users and developers and not just the vendors. I will cite noteworthy examples in my planned report and use them, in light of an examination of twenty years of data warehousing and of open source, to consider OSDW best practices.
It's important to recognize that the world of open-source has grown increasingly complex, with
- Commercial packagings and extensions of (otherwise) open-source software such as PostgreSQL;
- The inclusion of open-source software on commercial platforms such as DATAllegro's;
- Open-source Linux the operating system of choice for a decidedly commercial vendor, Oracle, which even provides its own repackaging of the commercial Red Hat Enterprise Linux distribution, branded as Oracle Unbreakable Linux; and
- That same hyper-commercial Oracle the owner of the open-source Berkeley DB embeddable DBMS, acquired via purchase of Sleepycat Software, and of MySQL's open-source InnoDB transactional database engine.
This complexity is simultaneously a response to market forces and conditions and of significant benefit to the market, namely, to all us DBMS (and other enterprise software) users. It is a sign of vibrancy and possibility. Clearly the time is right for a comprehensive, objective look at open source data warehousing.
Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He uses and consults on open source data warehousing and analytics.Given the strengths of open source database management systems, open source seems like a natural platform for data warehousing. We've seen a number of success stories over the years — Travelocity, O'Reilly, FTD, and Frontier Airlines — but the roster of case studies is mighty thin... I'd like to hear from actual users and developers for a report I am planning on open source (based) data warehousing.