Software // Information Management
Commentary
11/27/2007
08:14 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

(How) Has Open Source Data Warehousing Developed?

Given the strengths of open source database management systems, open source seems like a natural platform for data warehousing. We've seen a number of success stories over the years — Travelocity, O'Reilly, FTD, and Frontier Airlines — but the roster of case studies is mighty thin... I'd like to hear from actual users and developers for a report I am planning on open source (based) data warehousing.

Given the strengths of open source database management systems (DBMSes), open source seems like a natural platform for data warehousing. We've seen a number of success stories over the last few years, Travelocity, O'Reilly, FTD, and Frontier Airlines among them, but the roster of case studies is mighty thin. But I've only recently (re-)started looking — in the last couple of years, on the open-source front I've covered mostly BI (e.g., May, March) — and I hope to find many more for a report I am planning on open source (based) data warehousing (OSDW).Additional OSDW implementations are surely out there. My confidence stems from industry developments that include, since I last looked closely at the OSDW topic:

  • The open sourcing of the venerable Ingres DBMS by Computer Associates in 2005 and their subsequent spin-off of Ingres as an independent company;
  • The emergence of a number of DW/BI appliance deals that rely on an open source DBMS, notably Ingres-DATAllegro and PostgreSQL-based Greenplum with Sun Microsystems. (Netezza's DBMS was originally PostgreSQL but the company has moved away from open source.);
  • Last summer's entry by Oracle-compatible EnterpriseDB to the DW space, supported by enhancements such as shared-nothing distributed-database architecture, developed as an answer to Oracle Real Application Clusters (RAC) according to CTO Bob Zurek, who pointed me to this case study claiming extreme savings in FTD's move to PostgreSQL-based EnterpriseDB; and
  • MySQL's maturation as a DW platform, for instance with the addition of partitioning and strengthened clustering in the forthcoming 5.1 release.

But I'd like to hear from actual users and developers and not just the vendors. I will cite noteworthy examples in my planned report and use them, in light of an examination of twenty years of data warehousing and of open source, to consider OSDW best practices.

It's important to recognize that the world of open-source has grown increasingly complex, with

  • Commercial packagings and extensions of (otherwise) open-source software such as PostgreSQL;
  • The inclusion of open-source software on commercial platforms such as DATAllegro's;
  • Open-source Linux the operating system of choice for a decidedly commercial vendor, Oracle, which even provides its own repackaging of the commercial Red Hat Enterprise Linux distribution, branded as Oracle Unbreakable Linux; and
  • That same hyper-commercial Oracle the owner of the open-source Berkeley DB embeddable DBMS, acquired via purchase of Sleepycat Software, and of MySQL's open-source InnoDB transactional database engine.

This complexity is simultaneously a response to market forces and conditions and of significant benefit to the market, namely, to all us DBMS (and other enterprise software) users. It is a sign of vibrancy and possibility. Clearly the time is right for a comprehensive, objective look at open source data warehousing.


Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He uses and consults on open source data warehousing and analytics.Given the strengths of open source database management systems, open source seems like a natural platform for data warehousing. We've seen a number of success stories over the years — Travelocity, O'Reilly, FTD, and Frontier Airlines — but the roster of case studies is mighty thin... I'd like to hear from actual users and developers for a report I am planning on open source (based) data warehousing.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.