MapReduce: And You Were There

There's been a lot of buzz lately about Google's MapReduce framework for speeding up the processing of large datasets. It makes you wonder, did Google just dream this up in last couple years while all of the database vendors were sleeping? Or, paraphrasing Isaac Newton, were they standing on the shoulders of giants? The answer is, both.

Neil Raden, Contributor

August 29, 2008

2 Min Read
InformationWeek logo in a gray background | InformationWeek

There's been a lot of buzz lately about Google's MapReduce framework for speeding up the processing of large datasets. It makes you wonder, did Google just dream this up in last couple years while all of the database vendors were sleeping? Or, paraphrasing Isaac Newton, were they standing on the shoulders of giants?

The answer is, both.MapReduce is a programming framework, not a language per se. It is built on an old (40+ years) programming paradigm called functional programming (just for the record, the other type of paradigm is called imperative programming and includes common languages like C# and Java). Maybe I shouldn't have said old, because my first programming language was an early functional language, APL. I was a casualty actuary and APL was perfect for doing the kinds of mathematical manipulations we needed to do, such as matrix inversion in one keystroke, recursion and manipulating n-dimensional structures with composite functions. We used to drive IT nuts. Functional languages operate on, obviously, mathematical functions and some well-known functional languages today include the successor to APL, K and the statistical language R.

The separation of functional and imperative languages is pretty leaky these days as lots of functional programming ideas have seeped into other languages. In particular, the concepts of map and reduce are widely implemented. So why, then, does it matter what you use?

The symbolic language and its syntax, rules and scope have a lot to do with what programmers can achieve and how easily they can do it, but computers don't execute symbolic code, it has to be turned into instructions that a computer (or a whole bunch of computers) can understand. If every language just gets reduced to this level, you might wonder what the difference is. The real advantage is in the compiler. In a functional language, the map function, for example, when used in composition (putting functions together) can eliminate a second, expensive map by understanding them together at compilation. The compiler designer, working from a purely functional position, can develop compilations that really leverage the symbolic language.

And this is where Google has had breakthroughs. They had to approach this problem as a fundamental aspect of doing business and developed some creative ways to really power through sets of data, but they didn't do it alone. Computer scientists have been advancing these ideas for decades.There's been a lot of buzz lately about Google's MapReduce framework for speeding up the processing of large datasets. It makes you wonder, did Google just dream this up in last couple years while all of the database vendors were sleeping? Or, paraphrasing Isaac Newton, were they standing on the shoulders of giants? The answer is, both.

About the Author

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights