A recent study released by big data visual tools developer Zoomdata shows that data analytics has crossed the digital Rubicon, with non-relational database management systems (RDBMS) now comprising 70% of analytics data sources.
According to the study, approximately 40% of data sources are now composed of modern non-RDBMS sources like Hadoop, NoSQL, in-memory, and search databases. Another 20% are columnar/MPP analytic databases, and 10% are cloud native data stores, such as Amazon Redshift and Google BigQuery. Only 30% of data analytics is still performed against traditional relational database management systems, the study notes. The research was conducted for Zoomdata by O'Reilly Media, with 875 respondents participating in the survey.
The dramatic shift toward non-RDBMS and away from traditional relational databases such as MySQL, PostgreSQL and SQLite3 for analytics applications, isn't likely to reverse or even slow down given the way organizations now collect information. "Data is no longer homogenous, and the volume of data collected has grown exponentially over the last few decades," says Jamie Griffiths Craighead, a business and computer systems instructor at Beacon College in Leesburg, Fla. "Today, we have interconnected systems streaming non-normalized, non-homogenous data from many sources at a rate where there may be tens of millions of new records per day."
A prime force driving organizations away from RDBMS technology is the sheer scope and scale of the current data explosion. "We have more systems and devices generating more data than ever," says Tim Platt, vice president of IT business services at Virtual Operations, an IT support and management services provider based in Winter Park, Fla. "There’s a lot of (data); it takes diverse forms and it comes in quickly in many cases."
Traditional RDBMS systems have long struggled to accommodate scalability and scope issues, and are also dogged by numerous other disadvantages. "They are often licensed, proprietary software, with huge licensing fees tied to CPUs or cores and, even then, they struggle to scale horizontally," Platt explains. "Many times, the only option is to buy a bigger server with more CPU, more RAM, and more storage".
Worse yet, RDBMS's greatest strength -- data integrity -- has now become its biggest weakness. "To ensure consistent entry of data, (RDBMS) requires a strict data model enforced by tons of referential data relationship constraints," notes Gavin Woods, director of consulting at PITSS, a Troy, Mich.-based Oracle systems data conversion and modernization firm. Although still preferred in many use-cases, RDBMS's data model burden emerges as a serious limitation in cases where an organization requires flexibility and databases that can deployed over multiple instances nationwide or worldwide. "RDBMS does not fit this bill; enter the non-relational database," Woods says.
Non-RDBMS databases, such as NoSQL, offer a key benefit to application developers: ease of access. "Relational databases have a fraught relationship with applications written in object-oriented programming languages like Java, PHP and Python," observes Milind Shah, field CTO at cloud consulting services provider Stratiform of El Segundo, Calif. "NoSQL databases are often able to sidestep this problem through APIs, which allow developers to execute queries without having to learn SQL or understand the underlying architecture of their database system," he explains.
Instead of relying on tables, non-RDBMS databases are document-oriented. "This way, non-structured data -- such as articles, photos, social media data, videos, or content within a blog post -- can be stored in a single document that can be easily found, but isn’t necessarily categorized into fields like a relational database does," Shah says. Such as approach is highly intuitive, yet storing vast amounts of data in bulk requires extra processing effort and more storage than highly organized data. "That’s why Hadoop, an open-source computing and data analysis platform capable of processing huge amounts of data in the cloud, is so popular in conjunction with NoSQL database stacks," Shah says.
Another key benefit is that many non-traditional RDBMSes can be made to scale horizontally instead of vertically, allowing relatively low-cost servers to be combined into a single, powerful cluster. "It’s generally more cost effective to stand up four eight-core servers than to stand up a single 32 core server," Platt says. "Therefore, it’s more cost effective to scale, but the other benefit is that the data -- and processing power -- can be partitioned in ways that it can be processed in parallel, which means incoming data can be processed quicker, or analysis queries can run quicker."
By working directly and natively with non-RDMBS data stores, data analysts can expand their skillsets and value. "For example, analytics users that understand how to leverage graph queries can derive deep network structure insight and wide relationship analysis over graphed data that simply can’t be computed on relational schema structured data," says Mike Matchett, a market analyst at research firm Small World Big Data, based in Hopkinton, Mass. "Non-RDBMS solutions can solve great performance challenges, tackle huge scales of data, help mine value from a wider variety of data types and are essential for web-scale, real-time, graph structured applications," he adds. Additionally, the overwhelming majority of non-RDBMS solutions are open source, allowing users to tackle vast and varied amounts of data not only directly, but also more cost-effectively.
RDBMS: Not dead yet
Although fading, RDBMS isn't likely to vanish anytime soon. Transactional consistency, in particular, remains a traditional RDBMS stronghold. "If your data is structured in a consistent fashion and you don’t have scalability issues, a traditional RDBMS might be the best solution," Platt says. He notes that it's also easier for organizations to find experienced traditional RDBMS database administrators, data modelers and developers. "The tool sets and features of these platforms are very mature," he notes.
RDBMS is also still king -- at least for the time being -- for organizations' core systems of record, which demand the exactness and certitude that RDBMS continues to offer. "The evolution here, though, is that an RDBMS doesn’t handle all data well, and data consumers will always want to work with as much data as they can," Matchett says.
Still, as time goes on, RDBMS's hold is rapidly weakening. "If your data requirements aren’t clear at the outset, or if you’re dealing with massive amounts of unstructured data, you may not have the luxury of developing a relational database with clearly defined schema," Shah says. Think of non-relational databases more like file folders, assembling related information of all types. "If a WordPress blog used a NoSQL database, each file could store data for a blog post: social likes, photos, text, metrics, links, and more," Shah says.
To keep legacy RDBMS deployments alive, some tool providers have begun teaching new tricks to their old offerings. "Oracle’s MySQL has added some non-RDBMS-like capabilities in table fields that can be configured to store searchable JSON documents," Craighead says. Similarly, MongoDB, one of the most popular non-RDBMS offerings, can now store data in groups of JSON documents. "Non-relational data will continue to grow and we may see more hybrid database systems as traditional RDBMS add non-relational capabilities and non-relational systems add some features from traditional RDBMSes," Craighead observes.
Erik Gfesser, principal architect at Chicago-based IT consulting firm SPR Consulting, also sees a growing trend toward further hybridization. "Different types of processing, spanning the transactional to analytical spectrum, can be performed efficiently enough so that the need to use separate database products is lessened," he says.
Craighead notes that the trend toward non-RDBMS tools shouldn’t have any negative impact on data analytics users, since many analytics products now include support for non-relational data stores. "The positive impact for analytics users is the additional data that can be made available for analysis and increased query speed," Craighead says. "Non-RDBMS allows data to be stored in such a way that the need to perform join operations across tables or databases is reduced, leading to significant speed improvements.
Some like it hot
Choosing between RDBMS and non-RDBMS requires carefully examining the analytics task at hand, as well as future analytical needs. "It’s common on development projects for someone to want to implement a NoSQL database for the sole reason that it’s a hot new technology," Platt says. Yet that’s never the right way to make a decision. Sometimes, the best decision is to choose both technologies. "We see projects that combine both relational DBs, where it makes sense, and NoSQL, where it makes sense," Platt says. "You don’t need a 'one or the other' approach."
The database product selection process should always take into account how the product is going to be used in the real world, as well as who will be expected to provide long-term maintenance. "Enterprises should be careful not to adopt technologies simply because they view them as being commonplace, or because a handful of individuals advocate usage," Gfesser says.
Performing due diligence before product selection will likely pay big dividends down the road. "As a consultant, I've seen many instances in which clients joined the bandwagon rather than first performing due diligence, and this typically doesn't end very well," Gfesser says. "As someone who periodically attends technology focused meetups, I'm reminded of a Hadoop consultant who last year commented to the audience that 'most Hadoop clusters out there are a mess; people do not know what they are doing'."