Greenplum's Announcement and the Future of Data Marts - InformationWeek
Software // Information Management
09:20 AM
Curt Monash
Curt Monash

Greenplum's Announcement and the Future of Data Marts

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC)... Basically it makes sense... but the EDC vision isn't quite as new or differentiated as Greenplum ideally would wish one to believe...

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept -- mixing mine and Greenplum's together -- include:

  • Data marts aren't just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
  • Thus, it would be really cool if business users could have their own analytic "sandboxes" -- virtual or physical analytic databases that they can manipulate without breaking anything else.
  • In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
  • Whether or not you agree with that, it's an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it's an empirical fact that many business users have the clout to order up new data marts as well.
  • Consolidating data marts onto one common technological platform has important benefits.

In essence, Greenplum is pitching this story:

  • Thesis: Enterprise Data Warehouses (EDWs)
  • Antithesis: Data Warehouse Appliances
  • Synthesis: Greenplum's Enterprise Data Cloud vision

When put that starkly, it's overstated, not least because

Specialized Analytic DBMS != Data Warehouse Appliance

But basically it makes sense, for two main reasons:

  • Analysis is performed on all sorts of novel data, from sources far beyond an enterprise's core transactions. This data neither has to fit nor particularly benefits from being tightly fitted into the core enterprise data model. Requiring it to do so is just an unnecessary and painful bureaucratic delay.
  • On the other hand, consolidation can be a good idea even when systems don't particularly interoperate. Data marts, which commonly do in part interoperate with central data stores, have all the more reason to be consolidated onto a central technology platform/stack.

Of course, the EDC vision isn't quite as new or differentiated as Greenplum ideally would wish one to believe.

  • To a first approximation, EDC sounds a lot like what eBay has already built on Teradata equipment.
  • Greenplum's EDC vision also sounds a lot like what Stuart Frost was talking about at DATAllegro, what Dell was planning to build on DATAllegro equipment, and what Stuart continues to talk about now that he's been acquired into Microsoft.
  • Something like EDC can also be presumed to be implicit in the strategies of the other one-size-fits-all vendors -- i.e., Oracle and IBM.
  • Greenplum has only implemented a little more of the EDC vision so far than have other firms, unless you give it credit for being cheap/fast/MPP/running on commodity hardware, but deny that credit to Teradata (specialized hardware, and not cheap in its most popular configurations), Oracle (ditto for Exadata), IBM (also not cheap), or Microsoft/DATAllegro (not released yet).
  • Specifically: In Greenplum Release 3.3, which is being announced today, Greenplum is introducing the (enhanced?) ability for data marts to be spun out as a background operation, while the database otherwise remains functional. As of 3.3, spinning out a data mart is a command-line operation. But in Release 3.4, Greenplum plans to offer a web-based interface for same, at which point the "self-service data mart creation" discussion will become operative. Otherwise, EDC is a roadmap/vision/statement-of-direction much more than it is a fully-baked technical project.

One particular source of potential confusion is Greenplum's emphasis on the buzzphrase self-service (data mart). This seems to be a conflation of two related concepts:

  • End users should be able to create new data marts themselves. Strictly speaking, I view this ability as useless at most enterprises, and important at very few, because of logistical issues. (Who gives the permissions? Who decides which hardware is used?) That said, useless "end user" tools often wind up being important productivity aids for IT professionals, and this kind of "self-service" would surely be another example. Edit: Hmm. Doug Henschen inspired me to think that over again, and I'm beginning to soften. Suppose users could order up the data mart they want, perhaps test it at a very low processing priority (if they choose), and then send the completed request to IT for approval and provisioning. That would have some value.
  • End users should be able to manage data marts themselves, once created. That's a great idea, full of agility and don't-make-IT-a-roadblock goodness. Data miners and similar analytic professionals commonly have the technical ability to manage a simple database, and should be allowed to do so if it's ensured that they don't break anything for anybody else.

One thing that's needed for this technology to come to full fruition is sophisticated data movement and synchronization. Ideally, some tables in a data mart could be virtual -- views against a central database. But others would be physically recopied from the center, with all the ETL / ELT / ETLT / replication issues that entails. Meanwhile, it's not obvious that the ideal architecture is a simpleminded hub-spoke -- perhaps one should be able to spin data marts out of other marts, perhaps at least somewhat reducing the proliferation of tables and the recopying of data. And it should be easy for administrators to change deployment strategies, e.g., by starting a table out as a view and changing over to making it a physical copy as usage profiles change.

Oliver Ratzesberger of eBay also argues that workload management -- not a current Greenplum strength -- can be crucial. For example, if the CEO wants the CFO to get her an answer TODAY, the fastest approach may be to create an entirely virtual data mart, with very favorable SLAs (Service Level Agreements). More generally, if you're setting up dozens of marts that contain views of the central database, sophisticated SLA management can be essential. There's a big virtualization opportunity here -- but virtualization requires a lot of system management infrastructure.

Related links

Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll