Software // Information Management
News
10/10/2010
03:33 PM
Connect Directly
RSS
E-Mail
50%
50%

NoSQL Basics, Benefits and Best-Fit Scenarios

A popular new movement aims to take SQL database management systems out of the stack. But when is this emerging approach right for you?

Assumptions and Scenarios

The challenges to traditional database management ideas are piling up. For years, world-class applications have been built using MySQL, hardly the most robust of DBMS. These applications haven't used more than a tiny fraction of MySQL's capabilities. Indeed, the biggest systems have relied on "sharding" MySQL -- putting different rows of a MySQL table onto different machines, and relying on application logic to know which machine to access. If those applications use any joins at all, they're only ones that will never cause data to move to from one node to another as part of the join resolution. The same applications often also rely on an in-memory key-value store called memcached. (More on the "key-value" data model below.)

And since all that isn't already far enough from the relational DBMS mainstream for some developers' tastes, it's beginning to be superseded by a popular new movement called "NoSQL," which aspires to get SQL-based DBMS out of the stack entirely.

Before going further, let's clear up one point: "NoSQL database" is not the same as "non-SQL DBMS." True, NoSQL technically stands for "Not Only SQL"; but taking that to an extreme (which some marketers do) is misleading. After all, non-SQL DBMS have flourished literally since the invention of database management systems more than 40 years ago. Some leading pre-relational mainframe DBMS -- notably IMS and Adabas -- survive to this day. Small enterprise databases, built on Microsoft Access or Apple FileMaker, may have nothing to do with SQL. Though medium-sized enterprises usually run on relational DBMS, Intersystems Cache and various "multi-value" systems also have had considerable success. Even large enterprises often use special-purpose systems, such as multidimensional "OLAP" servers, but these don't have much to do with the NoSQL market.

Rather, NoSQL DBMS start from three design premises:

  • Transaction semantics are unimportant, and locking is downright annoying.
  • Joins are also unimportant, especially joins of any complexity.
  • There are some benefits to having a DBMS even so.
NoSQL DBMS further incorporate one or more of three assumptions:
  • The database will be big enough that it should be scaled across multiple servers.
  • The application should run well if the database is replicated across multiple geographically distributed data centers, even if the connection between them is temporarily lost.
  • The database should run well if the database is replicated across a host server and a bunch of occasionally-connected mobile devices.
In addition, NoSQL advocates commonly favor the idea that a database should have no fixed schema, other than whatever emerges as a byproduct of the application-writing process.

"Not Only SQL" is hardly the only terminological problem around NoSQL. Much of the innovation in the NoSQL arena revolves around "consistency," but that word does not mean the same thing as it does in ACID (Atomicity, Consistency, Isolation, Durability). If anything, consistency is closer to "durability," in that it refers to the desirable property of getting a correct answer back from the DBMS even in a condition of (partial) failure. In essence, there are three reasonable approaches to consistency in a replicated data scenario:

1. Traditional/near-perfect consistency, in which processing stops until the system is assured that an update has propagated to all replicas. (This is typically enforced via a two-phase commit protocol.) The downside to this model, of course, is that a single node failure can bring at least part of the system to a halt.
2. Eventual consistency, in which inaccurate reads are permissible just so long as the data is synchronized "eventually." With eventual consistency, the network is rarely a bottleneck at all – but data accuracy may be less than ideal.
3. Read-your-writes (RYW) consistency, in which data from any single write is guaranteed to be read accurately, even in the face of a small number of network outages or node failures. However, a sequence of errors can conceivably produce inaccurate reads in ways that perfect consistency would forbid.

Some systems allow tuning (such as configuration) as to which consistency model is supported; others are more locked in to a particular choice.

The theory behind all this is Eric Brewer's CAP Theorem,for Consistency, Availability, and Partition Tolerance, the point being that you can't have all three of those in the same system. But be warned -- "Availability" and "Partition" are used in unconventional word-senses too.

Previous
2 of 4
Next
Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.