Does MapReduce Signal The End Of The Relational Era? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Information Management
Commentary
12/4/2008
05:23 PM
Roger Smith
Roger Smith
Commentary
50%
50%

Does MapReduce Signal The End Of The Relational Era?

Companies such as Google, Yahoo, and Microsoft that operate Internet-scale cloud services need to store and process massive data sets, such as search logs, Web content collected by crawlers, and click-streams collected from a variety of Web services. Each of these companies has developed its own strategy to support parallel computations over multiple petabyte data sets on large clusters of computers.

Companies such as Google, Yahoo, and Microsoft that operate Internet-scale cloud services need to store and process massive data sets, such as search logs, Web content collected by crawlers, and click-streams collected from a variety of Web services. Each of these companies has developed its own strategy to support parallel computations over multiple petabyte data sets on large clusters of computers.As I wrote last week, the Google Systems Infrastructure Team used Google's MapReduce software framework to sort an astounding one petabyte of data (10 trillion 100-byte records) on 4,000 computers in six hours and two minutes. Earlier this year, Yahoo used Hadoop, an open-source MapReduce implementation, to sort one terabyte of data on 1,000 computers in 209 seconds on a 910-node cluster. MapReduce/Hadoop is a parallel programming model where users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

MapReduce adoption has not been without controversy. Earlier this year, database pioneer Michael Stonebraker decried MapReduce and MapReduce clones such as Hadoop, at least from the perspective of the database community, as:

1. A giant step backward in the programming paradigm for large-scale data intensive applications 2. A sub-optimal implementation, in that it uses brute force instead of indexing 3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago 4. Missing most of the features that are routinely included in current DBMS 5. Incompatible with all of the tools DBMS users have come to depend on.

Not surprisingly, then, other MapReduce variants have sprang up in the past few months that attempt to integrate MapReduce with SQL including Dryad at Microsoft, Pig at Yahoo, Hive at Facebook, and Jaql at IBM. Other platforms that provide both SQL and MapReduce interfaces within a single runtime environment include a couple of commercial frameworks, Greenplum and Aster Data.

Last year, Michael Isard of Microsoft Research gave a fascinating Google Tech Talk on the Google campus that's been posted on YouTube, "Dryad: A general-purpose distributed execution platform", about Microsoft's answer to MapReduce, which featured some spirited Q&A from Google engineers steeped in the MapReduce style of functional programming.

Functional programming emphasize rules, pattern-matching and the application of mathematical functions, in contrast to procedural languages like C++, Java, Basic, and database query languages such as SQL, which basically tell a computer (or cluster of computers) what to do, step-by-step: i.e., open a file, read a number, multiply by 1,000, or display something.

In a recent post, Joe Hellerstein, a professor of Computer Science at the University of California, Berkeley, recounts that Berkeley computer science undergraduates now must learn MapReduce, boasting that "MapReduce has brought a new wave of excited, bright developers to the challenge of writing parallel programs against Big Data." Similar enthusiasm for MapReduce has lead Bill McColl and others to proclaim "The End Of The Relational Era" but I'm inclined to think reports detailing the eminent death of relational databases and SQL are greatly exaggerated. I think what's more likely to happen in the near future is that major database vendors will begin offering capabilities to sort and manipulate massive data sets either directly with MapReduce or with SQL-like front-ends that will reduce MapReduce complexity. It may be too early to choose a dominant paradigm for data analytics for cloud-scale data sets but, given the familiarity of large number of developers and DBAs with SQL, I'd be surprised if a strictly functional programming paradigm for large-scale data intensive applications ends up carrying the day.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
IT Employment Trending Up; Data, Cybersecurity Skills in Demand
Jessica Davis, Senior Editor, Enterprise Apps,  11/11/2020
Slideshows
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
Commentary
How to Approach Your Mission-Critical Big Data Strategy
Mary E. Shacklett, Mary E. Shacklett,  11/17/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Slideshows
Flash Poll