Why MapReduce Matters to SQL Data Warehousing - InformationWeek
IoT
IoT
Software // Information Management
Commentary
8/28/2008
08:53 AM
Curt Monash
Curt Monash
Commentary
50%
50%
RELATED EVENTS
Building Security for the IoT
Nov 09, 2017
In this webcast, experts discuss the most effective approaches to securing Internet-enabled system ...Read More>>

Why MapReduce Matters to SQL Data Warehousing

Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this...

Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this.

The core ideas of MapReduce are:• For large problems, parallel computing is much more cost effective and/or feasible than the alternatives. • If you shoehorn programs into a certain very simple framework - namely that you're limited to only having map and reduce steps - then building a general execution engine that gives parallelism "for free" is straightforward. • A lot more problems can be solved within that framework than one might at first expect. In essence, you can do almost anything to a single record* - that's a map step. But you are sharply limited in how you combine information about multiple (often intermediate) records - that's a reduce step. Still, reduce steps let you do counts, sums, or other aggregations. That, plus the general power of map steps, makes MapReduce useful for at least three major classes of applications:

1. Text tokenization, indexing, and search 2. Creation of other kinds of data structures (e.g., graphs) 3. Data mining and machine learning

Except for the building of entire search engines, these are all application areas that data warehouse users should and do care about. And they all still could benefit from large performance increases, as is evidenced by the routine compromises analysts make in areas such as data reduction, sampling, over-simplified models and the like.

*Technically, MapReduce doesn't allow for records. Instead, you process key-value pairs and lists of same. But so far as I can tell, that's a distinction without a difference. LISP long ago proved that lists are a very general construct indeed.

MapReduce can be superior to pure SQL for these application areas, because they involve creation of data structures that are awkward to fit into a SQL rows-and-tables paradigm. Inverted-list text indexes just aren't tables. Formally, graphs can always be fit into tables; but even so, if you want to follow a graph for numerous hops, relational structures can be problematic. Data mining can involve very high-dimensional problems with super-sparse tables. And while exhaustive text extraction into flat tables works OK, getting from there to common-sense semantic hierarchies can be a bit of a kludge.

Additional links about MapReduce:

Three major applications of MapReduceAnother application of MapReduce Sound bites about MapReduceOther links about MapReduceGreenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this...

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll