Big Data. Big Decisions
InformationWeek
Special Coverage Series


Big Data Debate: End Near For Data Warehousing?

Some Hadoop advocates say this new platform will unseat the relational data warehouse from its dominant role in BI. Database champions say 'not so fast!' Share your opinion.

The enterprise data warehouse (EDW) is the backbone of analytics and business intelligence for most large organizations and many midsize firms. The tools and techniques are proven, the SQL query language is well known, and there's plenty of expertise available to keep EDWs humming.

The downside of many relational data warehousing approaches is that they're rigid and hard to change. You start by modeling the data and creating a schema, but this assumes you know all the questions you'll need to answer. When new data sources and new questions arise, the schema and related ETL and BI applications have to be updated, which usually requires an expensive, time-consuming effort.

Enter Hadoop, which lets you store data on a massive scale at low cost (compared with similarly scaled commercial databases). What's more it easily handles variety, complexity and change because you don't have to conform all the data to a predefined schema.

That sounds great, but where do you find qualified people who know how to use Pig, Hive, Scoop and other tools needed to run Hadoop? More importantly, how do you get fast answers out of a batch-oriented platform that depends on slow and iterative MapReduce data processing?

Will Hadoop supplant the enterprise data warehouse and relegate relational databases to data mart roles? Or is Hadoop far too green and too slow to change the way most people work? In our debate, Scott Gnau of Teradata and Ben Werther of Platfora square off. Share your opinion using the comment tool at the end of the article.

For The Motion

Ben Werther
Ben Werther
Founder & CEO, Platfora

The EDW Is A Relic

The proposition of the enterprise data warehouse seems tantalizing -- unifying all the data in your enterprise into one perfect database.

So you start an 18-month journey to find important data sources, agree on the important business questions, map the business processes, and architect and implement it into the one database to rule them all.

And when you are done, if you ever finish, you have a calcified relic of the world 18 months prior. If your world hasn't changed much in 18 months, then that might be ok. But that isn't the reality in any large business I've encountered.

Why is Hadoop was gaining so much momentum? Clearly it's cost-effective and scalable, and it's intimately linked in people's minds to companies like Google, Yahoo and Facebook. But there's more to it. Everywhere I looked, companies are generating more and more data -- interactions, logs, views, purchases, clicks, etc. These were being linked with increasing numbers of new and interesting datasets -- location data, purchased user demographics, Twitter sentiment, etc. The questions that these swirling data sets could one day support can't be known. And yet to build a data warehouse, I'd be expected to perfectly predict what data would be important and how I'd want to question it, years in advance, or spend months rearchitecting every time I was wrong. This is actually considered "best practice."

The brilliance of what Hadoop does differently is that it doesn't ask for any of these decisions up front. You can land raw data, in any format and at any size, in Hadoop with virtually no friction. You don't have to think twice about how you are going to use the data when you write it. No more throwing away data because of cost, friction or politics.

And yet, in the view of the status-quo players, Hadoop is just another data source. It is a dumping ground, and from there you can pull chunks into their carefully architected data warehouses -- their system of record." They'll even provide you a ‘connector' to make the medicine go down sweet. Sure, you are back in the land of consultants and 12-18 month IT projects.

But let's go through the looking glass. The database isn't the "system of record" -- it is just a shadow of the data in Hadoop. In fact there is nothing more authentic than all of that raw data sitting in Hadoop. But machinery has been missing to complete the story, namely a way to do interactive business intelligence, exploration and analysis against the data in Hadoop. Platfora is among the vendors working on this need.

Imagine what this means. Raw data of any kind or type lands in Hadoop with no friction. And without building a data warehouse, without the pain of ETL integration, and without any other IT project, everyday business users can put that data to work immediately. The machinery to support this is now appearing, and users' ability to harness data is undergoing a generational shift.

There is no longer a need for a traditional data warehouse. It is an inflexible, expensive relic of a bygone age. It is time to leave the dark ages.

Ben Werther is the Founder & CEO of Platfora, the company behind the first in-memory business intelligence platform for Hadoop. He is an industry veteran and big data thought leader and was head of products at Greenplum through the EMC acquisition.

Against The Motion

Scott Gnau
Scott Gnau
President, Teradata Labs

EDWs Will Thrive

Some people suggest that relational database management systems (RDBMS), and data warehouse built on top of them, are no longer needed. In fact, some argue that new technologies like Hadoop can do the job of the Data Warehouse at a fraction of the time and cost -- and, by the way, Hadoop is "free."

We can't blame some for wanting to believe the argument.

Before hitting the arguments, let me say that Hadoop has an important part in the future analytics environment because it provides a big data refinery, which can bring in massive amounts of raw material (data) -- and more importantly the corresponding analytics. One of the great features of Hadoop is that you can pile information into it without deciding in advance what you need to save or how you intend to use it. As businesses require more precise analytics, Hadoop as a source of new fuel is critical.

The core argument really comes down to a couple of points: 1. Data Warehouses are too "rigid and inflexible," and 2. The "community" will fix all of the limitations of Hadoop.

On the surface, these points sound very compelling. But with a deeper look they are misleading and self-contradictory.

Starting with the point about inflexibility of data warehouses, it's important to distinguish the technology, RDBMS, from the practice, data warehousing. Rigid schemas attributed to EDWs -- where the users have to define what they are looking for before starting the search, and where some of the misconceptions stem -- are often the result of rigid IT policy, and sometimes the result of dated or inadequate data warehouse architecture. Rigid structures are not an inherent problem in today's best data warehouse architectures that are designed for analytics.

Is structure bad in analytic environments? No! Imagine what would happen if you ran a public company and every quarter an analyst had to go through piles of un-modeled data, whether in Hadoop or otherwise, to come up with your financial quarterly results. The chance that something would go wrong in this process is too high to allow that uncertainty -- sometimes structure is really good to have!

So, do all these successful enterprises use structure and data models because it is the only way to go in an RDBMS or a Data Warehouse? Of course not. This is not about what a data warehouse can do; this is about what the business needs. Claiming that customers will stop requiring data quality and accurate data models across all their data infrastructure is misleading.

Let's move to the second question. Why would you need a data warehouse if Hadoop is going to support everything from SQL to BI in a year or two?

This claim ignores a simple fact: it took decades of work from some of the most brilliant computer scientists to build databases. Can Hadoop provide and implement the same functionality in a couple of years?

The answer is obviously, no, and it would be a real shame to waste the community's efforts to rebuild existing functionality vs. inventing newer and more extraordinary use cases. And some of the early deliverables in the Hadoop world that purport to eliminate RDBMS's require schemas and have physical design constraints that go against the "flexibility" argument of Hadoop. What's more, these claims leave out the fact that Hadoop was originally not developed for BI or SQL execution. It's like using a hammer when you really want a screwdriver -- let's free Hadoop to be the great tool it was designed to be!

History teaches us that the impact of new technologies is over-estimated in the short-term and underestimated in the long run. Hadoop is not and will not become a data warehouse. RDBMs and data warehouses will thrive, not die, because of Hadoop. We think Hadoop will be an integral part of future analytic data infrastructure solutions, but not the only part!

Scott Gnau is president of Teradata Labs, where he directs all research, development and sales support activities related to data warehousing, big data analytics, and associated solutions.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.