SPECIAL REPORT: Interop   See more>>

Top Signs You Need NoSQL For Your Data

When your relational database takes longer to process your data than to collect it, it's time to call in big data technology, said panelists at Interop.

8 Big Data Deployments In Detail
(click image for larger view)
Slideshow: 8 Big Data Deployments In Detail
Not everyone is sure whether they have big data or not, or whether they need a NoSQL system to handle it. One way to find out, said one adopter of a NoSQL approach, is to ask yourself whether it is taking you longer to process your data than it did to collect it.

The Enterprise Cloud Summit Monday at Interop 2011 in Las Vegas, a UBM TechWeb event, called on Jeremy Edberg, senior product developer at Reddit.com, and Bradford Stephens, founder and CEO of Drawn to Scale, a big data consulting firm, to address the confusion.


More Software Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Reddit.com is the social news site where anyone may submit a post of either self-created content or linked content and let other viewers vote on it. With enough positive votes versus negative, a blog, news story, or other item gets positioned on Reddit.com's front page.

Reddit.com collects so much information and records so many user interactions that Edberg realized at one point its relational database system was taking nearly as long to process the data as the site spent collecting it. Edberg started tracking the processing time and realized at a later date that it was taking 25 hours to process data collected over 24 hours.

He concluded that the situation was untenable. If the time the database system took to extract, transform, and load the data was growing longer than the collection phase, "pretty soon we were going to be in the infinite pit of despair."

Stephens said his experience as lead platform engineer at Visible Technologies, a firm producing business intelligence for social media, was similar to Edberg's. The main problem is that relational databases function most effectively when they sit on one large server. Relational systems do not easily distribute data across a cluster without introducing latencies into the database's operations.

Stephens said he tried to solve the problem through sharding, or distributing subsets of data around a cluster, each with its own database system to manage it as a discrete unit, "but we still couldn't get reads fast enough."

"You know you have a big data problem when your hardware budget is growing exponentially," he said.

Edberg agreed. It's a problem, he said, "when you have so much data in the database that you keep hiring consultants and operations guys to mitigate the effects of periodic slowdowns." They can only do so much. They will postpone the next recurrence of the problem, not eliminate it, he said.

The litany of signs continued. You know you have a big data problem when "developers want to produce new features but they're spending more time maintaining the systems than working on them ... the engineers can't seem to deliver the system's potential."

Stephens said Visible Technologies combined database triggers and Python commands in its database system, and the two conflicted with each other." The wait for the system to sort out the conflicts and respond to SQL queries imposed long waits, he said. Also, triggers embedded in relational database systems don't scale well to handle big data.

Stephens and Edberg are experienced at using HBase, a NoSQL system based on Hadoop open source code, and Cassandra, another open source NoSQL system, and said they can be made to scale easily. Edberg referred to both Oracle and SQL Server as having issues with "scaling out" over many servers.

"Keep a pile of commodity hardware in the corner of the data center and call it up when you need it," he said. In effect, NoSQL systems "scale out" by adding server nodes and load balancing across them. Cassandra is designed to use many nodes, and can continue operating if the server in a node fails.

HBase and Cassandra "are fantastic systems" but usually lack the ability to build indexes, the way relational databases do. On the other hand, by employing many nodes in a cluster, NoSQL systems allow applications to be built on top of a NoSQL database that can process immense amounts of data by subdividing the work among the nodes.

Edberg said HBase is fast on reads, slower on writes, which is good for social networking sites seeking to respond quickly to site visitors. They lag momentarily on updating the database with the information collected from the most recent visitors.

That lag doesn't matter much when site visitors want to see what trends are in aggregate among those participating. "Being fast on reads is great for our (Reddit.com) customers," said Edberg. They want to see what everybody else thinks; visitors already know what they think.

Join Wall Street & Technology for a webcast on management's critical role in regulatory compliance. It happens May 25. Find out more. (Free with registration.)

Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Subscribe to RSS

Resource Links