There is a typical story cycle in IT: Every new technology destroys and replaces an older one. PCs displaced mini-computers. Smartphones unseated cameras and flip phones. Online streaming wipes out video rental and music CDs.
So big data technologies should wipe out relational database management systems (RDBMS), right? That's not how the future is shaping up.
Peaceful coexistence is turning out to be the norm, as the two technologies prove to be complementary, not exclusive. As much as casual observers would like to see big data technologies win the future, RDBMS (the basis for SQL and database systems such as Microsoft SQL Server, IBM DB82, Oracle, and MySQL) is going to stick around for a bit longer.
In an interview with InformationWeek, Meta S. Brown, president of A4A Brown and author of "Data Mining for Dummies," said relational databases and big data technologies "have to coexist indefinitely. Neither one is capable of eclipsing the other."
As a consulting analyst, Brown is agnostic on which database technology will prevail, and looks instead for the method that provides the solution. For example, if you need to get the data to deliver precise answers, then "you've got to use a relational database," she said. "If you need an approximate answer in a big hurry," then a NoSQL database is the way to go."
"RDBMS isn't going anywhere for transactional systems," said David Teplow, founder and CEO of Integra Technology Consulting, in an interview with InformationWeek. Teplow has been a longtime user of RDBMS, going all the way back to the early 1980s with the release of Oracle 2.0.
In the 1990s, the need to measure and analyze data drove the construction of data warehouses. "[RDBMS] replaced anything else that had ever been used," Teplow said. "It became the de facto standard for data storage."
It was only when the increased volume, velocity, and variety of data became apparent that the need -- and the response -- of big data systems came about. RDBMS is still good on the volume front, but its fundamental nature makes it ill-suited for velocity and variety, Teplow said. Data must conform to some kind of predefined schema. Data coming in too fast and too heterogeneously -- think Facebook likes, GPS coordinates, and Web logs -- cannot be easily classified for RDBMS purposes. "That's where Hadoop and NoSQL take over."
Another way to look at the RDBMS/big data split is to look at centralization versus distributed architecture, said Lyn Robison, vice president and research director for data management strategies at Gartner Group.
RDBMS is about centralization. "The server owns and guards the data, ensuring its consistency," Robison said. Updates are serialized and sequenced. Access is also limited. "It is possible you could get too many client requests. A relational database will tell the client requests it cannot handle, 'Sorry. I'm too busy.'"
Adding capacity to a relational database means adding more memory, disk space, and computer power, but only for that single gatekeeper/repository, Robison said. In the realm of big data, reliant on NoSQL, you split the data among many servers, each one hosting a smaller slice with every server added via the cloud.
Consistency and accuracy are the benefits of the relational database approach. If one provided access to many servers for many clients under the big data approach, different entries would cause data variance between servers, Robison said. "Eventually, it becomes consistent." In the meantime, the company loses the sequence of the updates. "You kind of have to guess what happened. You can have data highly consistent but not always available, or data be readily available, but not consistent."
Data Is Relative, Governance Is Absolute
Relational databases also have a rich legacy of governance -- tools and apps to regulate access, manipulate data, and analyze everything in–between. It is a legacy big data is rapidly adopting for its own ends.
Big data is catching up with RDBMS on governance issues. It is a typical evolution process, Teplow said. "You get the core functionality you need. Nice things, like security and governance, come later."
The newer tools for big data "are not easy to use," said Robison. "It will take years for analytical tools to mature and become accessible to people who are not in data science."
Talk Is Cheap, Understanding Is Expensive
"I am not convinced people will stop worrying about the distinction," Brown said. Relational databases have been on the market for a long time. They have their share of supporters. Big data is the younger technology, with an equally fervid following.
Which brings us to users. They may not be conscious of which form of database technology they are using. "Users are not always clear [RDBMS and big data] are different products," Brown said. "The sales reps are steering them to whatever product they want [the users] to buy."
Vendors will want to offer RDBMS and big data products, because they want to be the one-stop shop for the corporate buyer, Brown said. Sales reps may not fully understand the products they are selling, while "shoppers focus on the brand," she added.
"It used to be that you could do everything with a relational database," Robison said. The inrush of varied data does not play well with RDBMS, so big data will become a necessity. Companies will embrace the new technology, but they will also be careful to minimize the variety of databases they have to manage. "They will choose some small number of databases to handle as many problems as they can," he said. Companies don't want the headache of managing 14 different databases, he added.
Big data is "the shiny new object," Teplow said. In the conventional narrative of IT, the new technology always disrupts the old one. "Disruption is newsworthy," he said. But big data is not completely disruptive. "There is no replacement of the transactional space." Relational databases are here to stay.