Big Data // Big Data Analytics
Commentary
8/8/2014
09:06 AM
Bharat Upadrasta and Austin Chungath
Bharat Upadrasta and Austin Chungath
Commentary
Connect Directly
RSS
E-Mail
100%
0%

NoSQL, NewSQL, or RDBMS: How To Choose

When should you choose a NoSQL or NewSQL option versus a conventional relational database management system? Here are 10 telltale traits that will help you make the right choice.

Today's databases are not only expected to be flexible enough to handle a variety of data formats, they're also expected to deliver extreme performance and to scale to handle humongous data volumes. Database architects have responded with NoSQL and NewSQL alternatives to relational database management systems (RDBMS), but how do you know when to choose which option?

To answer this question, start with a fundamental understanding of all three technologies. RDBMS can guarantee performance on the order of thousands of transactions per second. But the new face of online transaction processing (OLTP) in scenarios such as real-time advertising, fraud detection, multi-player games, and risk analysis, to name a few, involves close to a million transactions per second -- a pace that traditional RDBMS typically can't handle.

RDBMS have always been distinguished by the ACID principle set (atomicity, consistency, integrity, and durability), which ensures that data integrity is preserved at all costs. SQL became the de-facto standard of data processing because it combines elements like data definition, data manipulation, and data querying, all under one umbrella.

NoSQL database management systems store data in a variety of formats, chief among them being document store, graph store, and key-value store. Most NoSQL products jettison ACID performance to achieve data storage flexibility. They remove hard constraints, such as tabular row-store and strict data definitions, and they provision for scale with distributed architectures supporting high-performance throughput.

The newest entrants in the database arena, NewSQL, retain both SQL and ACID, but they overcome the performance overhead of RDBMS caused by features such as latching shared data structures, buffer pooling, record level locking, and write-ahead logging, primarily by embracing distributed computing architectures.

How do you choose?
To address the choice of database types, start with the following questions:

  • To what extent do you rely on data in terms of storage, processing, and analysis? The degree of dependency in each area can hugely shape the choice of a database. Application development, for example, is not heavily data centric, but data analysis is. Certain businesses revolve around data while others use data to supplement their core focus areas.
  • How important are the scale, flexibility, and performance aspects of a DBMS?
  • What is your level of investment in incumbent technologies? If you're already invested in a DBMS, are you prepared to incur the cost of migrating to a newer technology (and possibly face feature incompatibilities or administrative and programming skill gaps among your staff)?

Table 1 below sheds light on the comparative capabilities and strengths of RDBMS, NoSQL, and NewSQL databases.

Table 1: 10 Selection Criteria For Choosing Database Types

 Characteristic  RDBMS  NoSQL  NewSQL
 ACID compliance (Data, Transaction integrity)  Yes  No  Yes
 OLAP/OLTP  Yes  No  Yes
 Data analysis (aggregate, transform, etc.)  Yes  No  Yes
 Schema rigidity (Strict mapping of model)  Yes  No  Maybe
 Data format flexibility  No  Yes   Maybe
 Distributed computing  Yes  Yes  Yes
 Scale up (vertical)/Scale out (horizontal)  Yes  Yes  Yes
 Performance with growing data  Fast  Fast  Very Fast
 Performance overhead  Huge  Moderate  Minimal
 Popularity/community Support  Huge  Growing  Slowly growing

The nature of your data ultimately dictates the choice of database technologies. For instance, transactional data that requires strict compliance with data integrity and consistency favors the usage of RDBMS and NewSQL over NoSQL.

Volatile data, on the other hand, is characterized by changing object models and data structure formats that demand flexibility and make NoSQL the top choice followed by NewSQL to a lesser extent. RDBMS, with their rigidity of schema design, can prove very costly when dealing with such data.

When it comes to scale, enterprises usually prefer to scale out horizontally, an architectural approach that is cost effective and that guarantees better fault tolerance. Scaling a RDBMS database involves distributing data across multiple nodes, which can make data maintenance chaotic. NoSQL and NewSQL products are not limited by such constraints and are much easier to maintain when scaling out.

When considering performance, the determining factors in database choice are the underlying data formats and the number of operations being performed. It's impossible to say whether one type of database will be faster than another without context, but literature and benchmarks suggest that NewSQL products have outperformed NoSQL and SQL in areas such as elastic scalability and transactions processed per second. This is important for e-commerce businesses handling order tracking or inventory management and for online gaming businesses handing multimillion transactions per second, to cite a couple of examples. 

Database technologies are quickly adapting to keep up with exploding data volumes, growing data variety, and increases in data velocity. The array of RDBMS, NoSQL, and NewSQL options is vast, so it's important to gather detailed requirements on virtually every aspect of data consumption before making a choice. Will NoSQL or NewSQL dislodge the RDBMS as the industry standard? The processing of that query is underway.

When selecting servers to support analytics, consider data center capacity, storage, and computational intensity. Get the new Hadoop Hardware: One Size Doesn't Fit All issue of InformationWeek Tech Digest today (free registration required).

Austin Chungath is a senior research analyst in the Innovation & Development Group at Mu Sigma. Write to him at Austin.cv@mu-sigma.com.

Bharat Upadrasta is a senior research lead in the Innovation & Development Group at Mu Sigma. Write to him at bharat.upadrasta@mu-sigma.com. View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
adamfowleruk
100%
0%
adamfowleruk,
User Rank: Apprentice
8/14/2014 | 4:52:53 AM
Re: Exceptions to every generalization
May I gently suggest more editing in the future. Far too general comments in the piece, without any supporting evidence other than 'mysterious people said something' (probably on a forum on StackOverflow.com), and factual inaccuracies. 

This type of article is the standard poorly researched material that is being published by proponents of each individual type of database. My own area (NoSQL) is of course no exception. Lots of FUD being spread all over. Reading the article it's obvious there is a bias towards NewSQL by this author. Caveat emptor.

Generalisations are very dangerous. In particular those on ACID compliance, OLTP use cases, data warehousing, and grouping all NoSQL databases together (when there are four distinct and different types) are dangerous generalisations to make. Also grouping horizontal and vertical scalability in the same feature row is entirely missing the point of the different approaches.

I would suggest a certain upcoming book should be pre-ordered, but doubt such a comment would get past the admins. 8o)
MichelleMcLean
50%
50%
MichelleMcLean,
User Rank: Apprentice
8/10/2014 | 11:56:02 AM
getting HA and high performance out of SQL
RDBMS software has lagged in offering easy ways to scale and improve performance. New techniques like database traffic mgmt software deliver the best of both worlds - the data integrity and broad app support of SQL with the high performance of NoSQL. Drops in tranparently and in minutes performs functions such as read/write split and load balancing on behalf of the app - so no app rewrites. Scales single-server deployments with connection mgmt and caching. Easy alternaltive vs. re-writing for NoSQL. Check out www.scalearc.com for more info on this new technology.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
8/8/2014 | 11:12:45 AM
Exceptions to every generalization
I edited this piece and challenged the authors not to fall back on easy generalizations. Some say ACID performance and NoSQL don't mix, for example, but that's not always true. Areospike, for instance, supports ultra-high-speed, ACID-compliant performance, yet it's a NoSQL database. That's just one example. You should also consider that NewSQL databases are, in fact, relational database management systems, but they're taking advantage of relatively new architectural attributes that weren't around when the well-known incumbent RDBMS were invented.

A few years from now when the NewSQL products start to get old, I wonder what we'll call them? 
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 24, 2014
Start improving branch office support by tapping public and private cloud resources to boost performance, increase worker productivity, and cut costs.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.