A popular new movement aims to take SQL database management systems out of the stack. But when is this emerging approach right for you?
Assumptions and Scenarios
The challenges to traditional database management ideas are piling up. For years, world-class applications have been built using MySQL, hardly the most robust of DBMS. These applications haven't used more than a tiny fraction of MySQL's capabilities. Indeed, the biggest systems have relied on "sharding" MySQL -- putting different rows of a MySQL table onto different machines, and relying on application logic to know which machine to access. If those applications use any joins at all, they're only ones that will never cause data to move to from one node to another as part of the join resolution. The same applications often also rely on an in-memory key-value store called memcached. (More on the "key-value" data model below.)
And since all that isn't already far enough from the relational DBMS mainstream for some developers' tastes, it's beginning to be superseded by a popular new movement called "NoSQL," which aspires to get SQL-based DBMS out of the stack entirely.
Before going further, let's clear up one point: "NoSQL database" is not the same as "non-SQL DBMS." True, NoSQL technically stands for "Not Only SQL"; but taking that to an extreme (which some marketers do) is misleading. After all, non-SQL DBMS have flourished literally since the invention of database management systems more than 40 years ago. Some leading pre-relational mainframe DBMS -- notably IMS and Adabas -- survive to this day. Small enterprise databases, built on Microsoft Access or Apple FileMaker, may have nothing to do with SQL. Though medium-sized enterprises usually run on relational DBMS, Intersystems Cache and various "multi-value" systems also have had considerable success. Even large enterprises often use special-purpose systems, such as multidimensional "OLAP" servers, but these don't have much to do with the NoSQL market.
Rather, NoSQL DBMS start from three design premises:
Transaction semantics are unimportant, and locking is downright annoying.
Joins are also unimportant, especially joins of any complexity.
There are some benefits to having a DBMS even so.
NoSQL DBMS further incorporate one or more of three assumptions:
The database will be big enough that it should be scaled across multiple servers.
The application should run well if the database is replicated across multiple geographically distributed data centers, even if the connection between them is temporarily lost.
The database should run well if the database is replicated across a host server and a bunch of occasionally-connected mobile devices.
In addition, NoSQL advocates commonly favor the idea that a database should have no fixed schema, other than whatever emerges as a byproduct of the application-writing process.
"Not Only SQL" is hardly the only terminological problem around NoSQL. Much of the innovation in the NoSQL arena revolves around "consistency," but that word does not mean the same thing as it does in ACID (Atomicity, Consistency, Isolation, Durability). If anything, consistency is closer to "durability," in that it refers to the desirable property of getting a correct answer back from the DBMS even in a condition of (partial) failure. In essence, there are three reasonable approaches to consistency in a replicated data scenario:
1. Traditional/near-perfect consistency, in which processing stops until the system is assured that an update has propagated to all replicas. (This is typically enforced via a two-phase commit protocol.) The downside to this model, of course, is that a single node failure can bring at least part of the system to a halt. 2. Eventual consistency, in which inaccurate reads are permissible just so long as the data is synchronized "eventually." With eventual consistency, the network is rarely a bottleneck at all – but data accuracy may be less than ideal. 3. Read-your-writes (RYW) consistency, in which data from any single write is guaranteed to be read accurately, even in the face of a small number of network outages or node failures. However, a sequence of errors can conceivably produce inaccurate reads in ways that perfect consistency would forbid.
Some systems allow tuning (such as configuration) as to which consistency model is supported; others are more locked in to a particular choice.
The theory behind all this is Eric Brewer's
CAP Theorem,for Consistency, Availability, and Partition Tolerance, the point being that you can't have all three of those in the same system. But be warned -- "Availability" and "Partition" are used in unconventional word-senses too.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!