Today's databases are not only expected to be flexible enough to handle a variety of data formats, they're also expected to deliver extreme performance and to scale to handle humongous data volumes. Database architects have responded with NoSQL and NewSQL alternatives to relational database management systems (RDBMS), but how do you know when to choose which option?
To answer this question, start with a fundamental understanding of all three technologies. RDBMS can guarantee performance on the order of thousands of transactions per second. But the new face of online transaction processing (OLTP) in scenarios such as real-time advertising, fraud detection, multi-player games, and risk analysis, to name a few, involves close to a million transactions per second -- a pace that traditional RDBMS typically can't handle.
RDBMS have always been distinguished by the ACID principle set (atomicity, consistency, integrity, and durability), which ensures that data integrity is preserved at all costs. SQL became the de-facto standard of data processing because it combines elements like data definition, data manipulation, and data querying, all under one umbrella.
NoSQL database management systems store data in a variety of formats, chief among them being document store, graph store, and key-value store. Most NoSQL products jettison ACID performance to achieve data storage flexibility. They remove hard constraints, such as tabular row-store and strict data definitions, and they provision for scale with distributed architectures supporting high-performance throughput.
The newest entrants in the database arena, NewSQL, retain both SQL and ACID, but they overcome the performance overhead of RDBMS caused by features such as latching shared data structures, buffer pooling, record level locking, and write-ahead logging, primarily by embracing distributed computing architectures.
How do you choose?
To address the choice of database types, start with the following questions:
- To what extent do you rely on data in terms of storage, processing, and analysis? The degree of dependency in each area can hugely shape the choice of a database. Application development, for example, is not heavily data centric, but data analysis is. Certain businesses revolve around data while others use data to supplement their core focus areas.
- How important are the scale, flexibility, and performance aspects of a DBMS?
- What is your level of investment in incumbent technologies? If you're already invested in a DBMS, are you prepared to incur the cost of migrating to a newer technology (and possibly face feature incompatibilities or administrative and programming skill gaps among your staff)?
Table 1 below sheds light on the comparative capabilities and strengths of RDBMS, NoSQL, and NewSQL databases.
The nature of your data ultimately dictates the choice of database technologies. For instance, transactional data that requires strict compliance with data integrity and consistency favors the usage of RDBMS and NewSQL over NoSQL.
Volatile data, on the other hand, is characterized by changing object models and data structure formats that demand flexibility and make NoSQL the top choice followed by NewSQL to a lesser extent. RDBMS, with their rigidity of schema design, can prove very costly when dealing with such data.
When it comes to scale, enterprises usually prefer to scale out horizontally, an architectural approach that is cost effective and that guarantees better fault tolerance. Scaling a RDBMS database involves distributing data across multiple nodes, which can make data maintenance chaotic. NoSQL and NewSQL products are not limited by such constraints and are much easier to maintain when scaling out.
When considering performance, the determining factors in database choice are the underlying data formats and the number of operations being performed. It's impossible to say whether one type of database will be faster than another without context, but literature and benchmarks suggest that NewSQL products have outperformed NoSQL and SQL in areas such as elastic scalability and transactions processed per second. This is important for e-commerce businesses handling order tracking or inventory management and for online gaming businesses handing multimillion transactions per second, to cite a couple of examples.
Database technologies are quickly adapting to keep up with exploding data volumes, growing data variety, and increases in data velocity. The array of RDBMS, NoSQL, and NewSQL options is vast, so it's important to gather detailed requirements on virtually every aspect of data consumption before making a choice. Will NoSQL or NewSQL dislodge the RDBMS as the industry standard? The processing of that query is underway.
When selecting servers to support analytics, consider data center capacity, storage, and computational intensity. Get the new Hadoop Hardware: One Size Doesn't Fit All issue of InformationWeek Tech Digest today (free registration required).