Government // Enterprise Architecture
Commentary
10/4/2008
00:00 AM
Commentary
Commentary
Commentary
Connect Directly
RSS
E-Mail
50%
50%

DBMS Future?

Earlier this week I read here a post called  "DBMS Past, Present and Future". I thought It would be appropriate to (re)introduce an alternate future (which is already happening) to RDBMS use. The post below is actually a repost from something I wrote last year in my old DDJ blog i.e. pre dobbs code talk (with appologies to those of you who already read it back then). 

 The title I used then was - The RDBMS Is Dead

Okay now, that I have your attention -- RDBMS isn't dead yet, but we can see a whole class of applications (maybe a couple of classes) where the importance of the RDBMS as we know it today is greatly diminished.

In an article I posted recently on InfoQ, (which I also mentioned in the post on eBay architecture last week), I discussed the notion of database denormalization on Internet-scale sites (such as Amazon, eBay, Flickr, etc.). One point of denormalization is immutable data where there isn't a lot of gain in normalization to begin with.

The other thing is entity representation vs. speed. The problem is that joins are slow and sometimes you get to corners where if we want any type of decent speed we need to denormalize. Todd Hoff notes that as well:

The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again. Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update.

This point is, however, that these "corner cases" get more and more prevalent even in smaller scale application -- especially when you have complex entities (as is the case with defense systems, for example). Mats Helander recently wrote a post about saving to Blob, and only adding fields as needed for indexing and identity purposes. Mats also suggests the semi-transparent way of using XML columns where the database can do something with the otherwise opaque data.

This point, in fact, demonstrates that the relational data future is indeed not totally secure as we do see that that leading databases begin to treat XML data (which is hierarchical and not relational) as a native citizen -- to the point we can even index XML data.

So far we've seen a trend to denormalize more, handle non-relational data, what else? Ah, transactions.

I've worked on several systems where the data was constantly updated and actually gave the system's representation of the world outside (of the system) the focus was on availability and latency. Which is again also aligned with the approach taken by the large Internet sites which emphasis eventual consistency over immediate consistency.

In distributed systems, crashes happen. The RDBMS is show-stopper when it comes to crashes -- if we can't commit, we need to stop, roll back. Now maybe we can start-over. Is this acceptable? There are many 
scenarios where it is not. I've seen it in defense systems, in communications systems, and even in e-commerce systems ("if you are not responsive, I'll just go to the competition").

What do you do in the presence of error? Joe Armstrong suggest the following as the basis forErlang in his thesis:

To make a fault-tolerant software system which behaves reasonably in the presence of software errors we proceed as follows:

1. We organize the software into a hierarchy of tasks that the system has to perform. Each task corresponds to the achievement of a number of goals. The software for a given task has to try and achieve the goals associated with the task. Tasks are ordered by complexity. The top level task is the most complex, when all the goals in the top level task can be achieved then the system should function perfectly. Lower level tasks should still allow the system to function in an acceptable manner, though it may offer a reduced level of service.The goals of a lower level task should be easier to achieve than the goals of a higher level task in the

2. We try to perform the top level task.

3. If an error is detected when trying to achieve a goal, we make an attempt to correct the error. If we cannot correct the error we immediately abort the current task and start performing a simpler task.

On top of that we try to keep any update local, i.e. within a task boundary on the hardware where the task occurred -- distributing the transactions is not a good option. I outlined why when I talked about SOA and cross-services transactions but the reasoning holds.

Well, truth be said the RDBMS is not dead, its demise probably not even around the corner. Also this does not mean that there aren't any uses for a database. But that's true for other architectural choices. Whoever said that a single tier solution is not the right one for very specific types of system....

RDBMS succeeded to become the de-facto standard to building system because they offer some very compelling attributes -- ACID brings a lot of piece of mind. Large-scale systems, low-latency systems, and fault-tolerant systems opt for another set of compelling attributes (BASE). The point is that when you design your next solution maybe the conventional database thinking is something that you should at least give another thought to and instead of just following dogma.

Comment  | 
Print  | 
More Insights
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.