Scalability and flexibility. These are the two key attributes of NoSQL databases, the ones that have made them big data darlings. NoSQL databases haven't quite reached the hype heights of the Hadoop data management framework, but they're drawing a lot of attention and experimentation. Choose wisely among the many and varied NoSQL options, or the trade-offs needed to get scalability and flexibility might be your project's undoing.
The label NoSQL covers a diverse collection of databases that tend to have at least two elements in common: distributed computing architectures and schemaless design. The databases are scalable because they were built to store and manage data distributed across (typically) x86 commodity server clusters that can be easily scaled out by adding more machines. They're flexible because, unlike relational databases, NoSQL databases don't require a predefined schema (a.k.a. data model) that demands one way to manage data in columns and rows. Under relational databases, those data models get ever more difficult to change as the database grows. That rigid data model becomes a problem if a company's evolving business model requires it to use data in a way it never anticipated.
NoSQL databases are also simple and inexpensive compared with their relational counterparts. The simplicity contributes to fast development and fast performance at scale. Many (though not all) NoSQL databases are open source, so you can get started with free community software and add commercial support and helpful commercial add-on modules as your deployment grows. Given that the biggest dissatisfaction with existing databases comes from licensing costs and terms, free and open will look appealing to many IT teams, especially those bootstrapping a pilot project.
Do these characteristics make NoSQL right for your company or project? They might, but there are drawbacks to NoSQL, most notably the lack of SQL querying capabilities and ACID (atomic, consistent, isolated and durable) performance. Those drawbacks can be frustrating to relational database veterans. The capabilities of NoSQL databases are also diverse, so you have to find the right tool for the job.
"Adopt the technology by understanding what it's good at, and try that first," Rick Branson, infrastructure engineer at Instagram, told a recent NoSQL conference. That's good advice, so let's explore the diversity of options.
Whereas relational databases are general-purpose platforms, NoSQL databases have been developed to tackle particular, often extreme challenges. Amazon.com in 2007 came up with the Dynamo database to keep its massive, global e-commerce site always up and running. (It now sells DynamoDB as an Amazon Web Services online service.) Dynamo helped inspire Facebook's development of Cassandra, which it then contributed to open source. Relational databases just weren't designed to handle the quantity of data, number of users and ever-changing data requirements of outfits such as Amazon and Facebook.
Today, there are four important classes of NoSQL databases: key-value stores such as Riak and Redis; document databases such as MongoDB and Couchbase; wide-column databases such as Cassandra and HBase (the latter is part of the Hadoop framework); and graph databases such as Neo4j and Allegro. All of those databases are well-known among Internet startups and established Web-scale companies.
Instagram, for example, implemented Cassandra in the fall of 2012 (a few months after Facebook acquired it for $1 billion). The image-sharing service draws more than 150 million users a month, and each day those people add 55 million images and like 1.2 billion. The initial Cassandra deployment was a six-node cluster to replace a centralized security server-logging application that had been running on Redis, another NoSQL database. Redis is a key-value store designed for speed, thanks to its in-memory design. But Instagram's applications were growing quickly, outstripping RAM capacity that would have been expensive to scale, says Branson.