9 NoSQL Pioneers Who Modernized Data Management - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
8/30/2015
12:06 PM
Charles Babcock
Charles Babcock
Slideshows
Connect Directly
Twitter
RSS
E-Mail

9 NoSQL Pioneers Who Modernized Data Management

The folks profiled here are tackling data management for the Internet Age, helping us all understand what can be done with a mass of unstructured information. See how their work has transformed the way we handle databases.
2 of 11

Doug Cutting
Doug Cutting, the original author of Hadoop with Michael Cafarella, once said on a panel reported on by InformationWeek that the distributed systems found in cloud computing 'are not just for solving point problems. You have this ability to do things that you hadn't thought of before. … If you've got thousands of processors, you can do a whole lot with just a few crude tools.' 
Technically speaking, Hadoop is not a NoSQL system. It started out in 2005 as a distributed file system, capable of efficiently storing and sorting millions of objects using Google's MapReduce. But the example of what a distributed data management system could do helped fuel an explosion of innovation and effort.  Amazon's publication of a paper on its Dynamo NoSQL system in 2007 helped fan the flames.

At the time he co-created Hadoop, Cutting was working at Yahoo on Nutch, a crawler-based search engine for indexing the Web, when he hit upon combining his batch-sorting system with MapReduce and allowing it to scale out to much larger capacities. As a name, Hadoop has no significance as an acronym. Cutting named it for a stuffed toy elephant in his family. 
Cutting no longer actively contributes code to Hadoop, but he still monitors and comments on Hadoop developments. He is now a software architect at Cloudera, a firm producing a management layer and tools and special features on top of Hadoop. He was elected president of the Apache Software Foundation in September 2010 in recognition of his open source project leadership. Apache continues to sponsor ongoing work on Hadoop. Cutting's colleague, Cafarella, went on to become an assistant professor of computer science at the University of Michigan, Ann Arbor.
(Image: Cloudera)

Doug Cutting

Doug Cutting, the original author of Hadoop with Michael Cafarella, once said on a panel reported on by InformationWeek that the distributed systems found in cloud computing "are not just for solving point problems. You have this ability to do things that you hadn't thought of before. … If you've got thousands of processors, you can do a whole lot with just a few crude tools."

Technically speaking, Hadoop is not a NoSQL system. It started out in 2005 as a distributed file system, capable of efficiently storing and sorting millions of objects using Google's MapReduce. But the example of what a distributed data management system could do helped fuel an explosion of innovation and effort. Amazon's publication of a paper on its Dynamo NoSQL system in 2007 helped fan the flames.

At the time he co-created Hadoop, Cutting was working at Yahoo on Nutch, a crawler-based search engine for indexing the Web, when he hit upon combining his batch-sorting system with MapReduce and allowing it to scale out to much larger capacities. As a name, Hadoop has no significance as an acronym. Cutting named it for a stuffed toy elephant in his family.

Cutting no longer actively contributes code to Hadoop, but he still monitors and comments on Hadoop developments. He is now a software architect at Cloudera, a firm producing a management layer and tools and special features on top of Hadoop. He was elected president of the Apache Software Foundation in September 2010 in recognition of his open source project leadership. Apache continues to sponsor ongoing work on Hadoop. Cutting's colleague, Cafarella, went on to become an assistant professor of computer science at the University of Michigan, Ann Arbor.

(Image: Cloudera)

2 of 11
Comment  | 
Print  | 
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
100%
0%
Charlie Babcock,
User Rank: Author
9/4/2015 | 6:25:03 PM
How about Chris Lindblad as the 10th?
A nominee that's come in as the tenth pioneer is Chris Lindblad, co-founder of MarkLogic predecessor Cerisent in 2001. It became MarkLogic in 2005 with headquarters in San Carlos, Calif. He is the former chief architect of the Ultraseek search engine at Infoseek; Ultraseek is now part of Autonomy. Lindblad still works as chief of development at the firm. MarkLogic is a document-oriented database that evolved out of XML database roots, which can also conduct relational's ACID transactions. The BBC used MarkLogic for its 2012 Olympic Data Services. So is Chris a NoSQL pioneer or a combined database system pioneer? Any votes for Chris Lindblad?

 

 
Charlie Babcock
100%
0%
Charlie Babcock,
User Rank: Author
9/4/2015 | 5:53:11 PM
Altiscale CEO describes Cutting's sense of system design
This comment came in from Raymie Stata, who hired Doug Cutting at Yahoo at the time Stata was chief architect of algorithmic search. (He's now CEO of Hadoop company, Altiscale.) "What I appreciate about Doug is that he has a great design sense--relatively few programmers have that--and yet he's also very practical (and prodigious), so he gets things done fast.  Lucene and Avro demonstrate Doug's originality and creativity, and the result is clean but practical systems that have become very popular. The case of Hadoop is different: his good design sense told him that the Google guys did a great job and that there wasn't much sense in trying to improve upon that. He stated this quite explicitly. While those around them (some inside Yahoo!, some out) were busy trying to improve upon the MapReduce paradigm, Doug used Google's paper as the blueprint and (with Mike Cafarella) cranked out the initial implementation amazingly fast. This was important, because it turned out that the important engineering was more about building an implementation that could scale, rather than improving upon the abstraction. It's unusual for a single developer to have two 'smash hits' in Open Source (Lucene and Hadoop). I chalk that up to Doug's combination of design sense and practicality." - Raymie Stata, CEO of Altiscale

 

 
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
8/31/2015 | 4:22:55 PM
NoSQL system builders are not necessarily data scientists
Asksqn, Few NoSQL pioneers would ever claim to be data scientists. They're system builders for big data purposes, not data scientists working with big data. But you might try Lisa Morgan's: 6 Characteristics of Data Drive Rock Stars. http://www.informationweek.com/big-data/big-data-analytics/6-characteristics-of-data-driven-rock-stars/d/d-id/1320502

 
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

News
Becoming a Self-Taught Cybersecurity Pro
Jessica Davis, Senior Editor, Enterprise Apps,  6/9/2021
News
Ancestry's DevOps Strategy to Control Its CI/CD Pipeline
Joao-Pierre S. Ruth, Senior Writer,  6/4/2021
Slideshows
IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Slideshows
Flash Poll