9 NoSQL Pioneers Who Modernized Data Management
The folks profiled here are tackling data management for the Internet Age, helping us all understand what can be done with a mass of unstructured information. See how their work has transformed the way we handle databases.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/blt4e4e84c8cba0def7/64cb4ac84020f2567344dec5/NoSQLPioneers.jpg?width=700&auto=webp&quality=80&disable=upscale)
NoSQL started off in 2006-2007 as an edgy, against-the-mainstream name, a counterpoint to the complete dependence on the SQL access language that relational database systems had at the time.
The pioneers of NoSQL systems never said it was their goal to replace SQL systems, which have been the foundation of venerable relational databases for the last 30 years, such as Oracle, DB2, and PostgreSQL. Rather, NoSQL developers wanted to be freed from the restrictions and preoccupation with precision that marked SQL systems. They were driven by the new demands the Internet was placing on how databases operated in the real world.
Here are the key differences between SQL and NoSQL systems:
Where SQL relational databases were concerned only with data integrity, NoSQL is concerned with data immediacy and relevancy.
Where a SQL relational system employs the unforgiving ACID test, NoSQL invokes "eventual consistency."
Where a SQL system dictates a particular data type, NoSQL allows a loosey-goosey tolerance of many data types -- and still comes up with workable intelligence.
Instead of precision with defined schemas, NoSQL pioneers sought an ability to handle information at high volume and high speed. Instead of getting one transaction exactly right, they wanted to deal with a million users at once.
NoSQL offered the sort of approach that a Twitter or Facebook might appreciate. And, in fact, those organizations quickly became big NoSQL users. Avinash Lakshman at Facebook was a pioneer involved in the formation of two NoSQL systems, DynamoDB during a prior stint at Amazon, and Cassandra at Facebook.
For companies with robust public-facing Internet operations – such as social media, financial institutions, and retailers – customer service is a primary business driver for deploying NoSQL systems.
Let's use Facebook as an example. Does it really matter if someone sees you have 223 friends on Facebook when, in fact, a second ago you picked up your 224th? Users may tolerate such a lag. What they won't tolerate is any delays in getting a response form Facebook when they have a problem. Now, scale that out to thousands of servers answering users' questions, and sifting the terabytes of data required to do so, and you begin to see what NoSQL developers are up against. They're tackling data management for the Internet age. And we say, more power to them.
[Want to get real-time analytics right? Here are some tips.]
Let's celebrate some of the pioneers of the field. The folks profiled here have all acknowledged that NoSQL is less about being opposed to relational SQL database systems, and more about what can be done with a mass of unstructured information. There's no claim to a hierarchy in this list. The pioneers featured here are by no means the only developers and entrepreneurs who've made contributions. No attempt has been made to represent all the key developers, and we'd welcome your help in rounding out our list.
Amazon, for example, declined to name the individuals behind SimpleDB, an early NoSQL system that emerged in December 2007 and remains available as a service today, because it doesn't single out individuals from team efforts.
With that, we give you nine NoSQL pioneers whose work has changed how we process and respond to information. What has your experience been with NoSQL? Have you had the pleasure of working with any of these pioneers? Are there others you'd like to see get credit for their work in NoSQL? Tell us all about it in the comments section below.
Doug Cutting, the original author of Hadoop with Michael Cafarella, once said on a panel reported on by InformationWeek that the distributed systems found in cloud computing "are not just for solving point problems. You have this ability to do things that you hadn't thought of before. … If you've got thousands of processors, you can do a whole lot with just a few crude tools."
Technically speaking, Hadoop is not a NoSQL system. It started out in 2005 as a distributed file system, capable of efficiently storing and sorting millions of objects using Google's MapReduce. But the example of what a distributed data management system could do helped fuel an explosion of innovation and effort. Amazon's publication of a paper on its Dynamo NoSQL system in 2007 helped fan the flames.
At the time he co-created Hadoop, Cutting was working at Yahoo on Nutch, a crawler-based search engine for indexing the Web, when he hit upon combining his batch-sorting system with MapReduce and allowing it to scale out to much larger capacities. As a name, Hadoop has no significance as an acronym. Cutting named it for a stuffed toy elephant in his family.
Cutting no longer actively contributes code to Hadoop, but he still monitors and comments on Hadoop developments. He is now a software architect at Cloudera, a firm producing a management layer and tools and special features on top of Hadoop. He was elected president of the Apache Software Foundation in September 2010 in recognition of his open source project leadership. Apache continues to sponsor ongoing work on Hadoop. Cutting's colleague, Cafarella, went on to become an assistant professor of computer science at the University of Michigan, Ann Arbor.
Dwight Merriman, co-founder and CEO of 10Gen, the firm that sponsored the project developing the document-oriented MongoDB system, didn't really need to add another feather to his cap. Merriman was already co-founder and CTO of Doubleclick, where he architected the online advertising management system DART, sold to Hellman and Friedman in 2005 and resold to Google in 2007 for $3.1 billion. Doubleclick serves tens of billions of ads a day.
As a founder of 10Gen, now known as MongoDB Inc., Merriman wrote some of the original system's code and says the name derives from "humongous." It's a database system that scales out by adding nodes to a server cluster. Unlike some NoSQL systems, MongoDB can be easily indexed. "On a technical level, 10Gen is one of the few New York companies that impresses Silicon Valley sophisticates," noted the technology-oriented online news service Observer.com in October 2011.
Merriman chairs the company MongoDB, whch provides professional services and technical support to enterprise users of the open source system. Merriman is also a co-founder of AlleyCorp., a group of Internet companies clustered around Silicon Alley in NYC.
CouchDB is similar to MongoDB in its design as a document or software object storage system. Damien Katz, a former Lotus Notes developer at IBM, first aired in an April 2005 blog that he was working on a "large scale object database" that would be a lightweight, document-oriented system.
Katz self-funded the project through 2005-2006, until it became in February 2007 an Apache open source project. Katz, meanwhile, was serving as CTO of CouchOne, the firm he founded to support CouchDB. Its name is an acronym of a core cloud-computing concept: Cluster Of Unreliable Commodity Hardware.
In January 2012, Katz announced in a blog he was leaving Apache CouchDB behind to work on Couchbase Server, a combination of CouchDB and memcached. Parts of CouchDB were rewritten from Erlang into C. Meanwhile, CouchOne merged in February 2012 with Membase to form Couchbase, a NoSQL system managing JSON objects with scalable cache management. Katz was chief architect and CTO of Couchbase through August 2013.
In establishing Couchbase, Katz was joined by James Phillips, senior VP of products and former chief strategy office of NorthScale. Zynga implemented NorthScale's data caching system for use with Mafia Wars, Farmville, and other online games for millions of users.
"For years I've tried my damnedest to get away from C. Too simple, too many details to manage, too old and crufty, too low level. I've had intense and torrid love affairs with Java, C++, and Erlang. I've built things I'm proud of with all of them, and yet each has broken my heart," Katz wrote in "The Unreasonable Effectiveness of C," his January 2013 tribute to the C language.
Two years ago, Katz left Couchbase as it received $25 million in venture capital funding. He became an architect for Salesforce.com in 2014, leaving after a year to become a self-employed programmer, a job he described as "CEO and janitor." That seemed about right for a developer whose favorite saying was: "Just relax. Nothing is under control."
In July 2015, Katz began producing Focus and Drive, an online show in which he and friends drive to different startups to talk about what they're doing.
Salvatore Sanfilippo developed the Redis in-memory, NoSQL system for use in connection with his two small technology businesses. The code took on a life of its own, a community formed around it, and Sanfilippo, accustomed to working on code from his home on Sicily, was pressured to start a company around Redis. He declined, wanting to reserve some time to spend with his family.
In March 2010, Sanfilippo joined the payroll of VMware, then later the VMware-spinoff Pivotal, to continue contributing to Redis. He was a frequent contributor of tools for extracting data, among other things.
In explaining his decision to write Redis, he told European online publication Eu-startups.com, in January 2011 that he had previously tried to get MySQL to do things that it was not good at doing. "Why there is no database that is able to natively handle natural ordering of items, that is, I put things inside with this order, so it should be fast to get the latest N items. After this consideration, I started coding a prototype of the system, and shared the first beta on Hacker News, receiving good feedback."
Derek Collison, chief architect of VMware's cloud division, noted in his own blog at the time: "Many Redis customers have already experienced the tremendous benefit of storing select pieces of data within Redis for fast access... Some have used Redis exclusively, forgoing a relational database all together."
On June 25, 2015, Redis Labs in Mountain View, Calif., the largest company to grow up around Redis, announced that Sanfilippo was joining the firm as lead developer of the open source Redis system. "I'll be able to work as I do currently, spending all my time in the open source side of the project," Sanfilippo wrote on his blog, Antirez. Redis is currently ranked at No. 10 in popularity among all database systems by Database Engines. Only two other NoSQL systems, Cassandra and MongoDB, make it into the top 10. The other systems in the top 10 are relational database systems, including Oracle, MySQL, and SQL Server in the No. 1, 2, and 3 spots, respectively.
Andy Gross is the chief architect and co-author of Riak, a decentralized document database system that uses MapReduce. Gross has stated Riak was inspired by Amazon's Dynamo. Riak scales easily and predictably by enlarging its server cluster.
Gross and a development team that largely emigrated from Akamai have given Riak features that make it easier for developers to quickly prototype and test potential NoSQL applications with the system. This has helped speed up deployment of mass data handling systems at companies such as Comcast and Mozilla. It's also been used for the Ask.com search engine's sponsored-link system.
Like most NoSQL systems, Riak was designed to be fault tolerant, so that if a piece of hardware fails underneath its data is recovered.
Riak is the system of Basho Technologies; Gross was Basho's chief architect from 2007 through March 2014. He left to become a software engineer with Twitter, then founded Opsee in February 2015. In July, Gross became principal software engineer of Gofactory.
He is also a former senior engineer at Apple, Akamai Technologies, and Mochi Media. In October 2012, on the fifth anniversary of Amazon's white paper on Dynamo, Gross gave a talk at QCon in San Francisco on how influential Dynamo had been in inspiring many of the subsequent NoSQL systems.
Brad Fitzpatrick was the creator of LiveJournal.com at his company, Danga Interactive, and an author of the Memcached software that powered it. LiveJournal was an online personal journaling system for use by people around the globe.
In 2007 Fitzpatrick went to work for Six Apart as chief architect. Several of the other contributors to Memcached, a distributed, object caching system, formed NorthScale to build out the key value store system, Membase, that was used beneath Memcached as persistent storage. Membase was a NoSQL document-oriented system. In 2011 its authors merged with CouchOne to become the flagship of a new company called Couchbase, to offer what became known as Couchbase Server. It was designed to provide easy-to-scale document access with high volume throughput. No single individual is associated with the emergence of Membase.
At last report, Memcached author Fitzpatrick had become part of the Go team at Google. LinkedIn lists him as a member of the software engineering staff there.
Who would you name as the 10th member of this list of NoSQL pioneers? Tell us in the comments section below.
Who would you name as the 10th member of this list of NoSQL pioneers? Tell us in the comments section below.
-
About the Author(s)
You May Also Like