Big Data. Big Decisions
InformationWeek
Special Coverage Series


Big Data Gets A Closer Look From Riak

Database system adds fine-grained queries of relational systems to NoSQL approach.

Riak 1.0, a NoSQL system with Cassandra characteristics but a lighter, slimmer profile, was announced Tuesday. It can take huge slices of unstructured data and reduce them down into much more manageable, bite-sized chunks.

Cassandra has proven adept at handling really heavyweight jobs, such as serving as the datastore for Facebook users. Riak fits lighter weight, but highly interactive, roles. It's in use at Comcast, Yammer, ClipBoard.com, and Denmark's health system.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Riak has also gained favor, even as a latecomer in the field of NoSQL systems, in part because it consumes fewer resources in getting its finer-grained results. Its queries can have more of the specificity of SQL queries without also picking up the performance drawbacks of relational database, said Tony Falco, COO of Basho, the firm that produces Riak.

"It's very efficient at capturing the data of a user session. Comcast uses Riak for managing content streaming for users its Xfinity TV service," said Falco in an interview.

[Wonder why leading big data users switch from relational to NoSQL? See Twitter Drops MySQL For Cassandra.]

Mozilla considered HBase, Cassandra, and Riak, and ended up selecting Riak for its Test Labs Pilot project analyzing user data obtained through use of its browser. Riak was selected to capture sessions of 10 million users over a two-day period, amounting to 1.2 TB of data, said Daniel Einspanjer in a May 10 blog as lead developer for the Mozilla metrics team. Riak required less manpower as a well-tested, REST-base system, he concluded, and it was "much lighter on memory requirements."

Like HBase and Cassandra, Riak, is a key value store system that can collect unstructured data and store it as objects in rows that can then be queried. It's also highly scalable, able to distribute itself over a server cluster and add new servers as needed, while maintaining its own high availability.

The 1.0 version includes a new feature, secondary indices, which allows a Riak user to retrieve data through the use of compound criteria. For example, customers between the ages of 17 and 22 in certain states or regions of the country can be identified from the system, instead of just all the customers of a given state or all customers within the age range.

Another Riak feature is Riak Pipe, an implementation of the MapReduce function that distributes a task onto cluster nodes in a way that is most efficient for handling the relevant data. Riak Pipe, in effect, sends a Riak query to a node close to the data for its most efficient execution on the cluster.

Falco said NoSQL systems are good for collecting masses of data and then making chunks of that data available for hundreds or thousands of users at a time, often those visiting a website. Even so, business users often want to be able to submit queries and retrieve data that is more specific than a named key value--an identifier for a particular class of data, such as "customers"--allows. Much valuable information is buried in captured website user sessions, for example. "Once all that data is in there, you want to be able to get it out," Falco noted.

Riak is written in Erlang, a language that gives a system built-in support for distribution across a server cluster, fault tolerance, and an ability to absorb new hardware being added to the cluster without disrupting operations.

Riak is available under an Apache 2 license as open source code or in Basho's commercially supported version. Average deal size for small and midsize businesses runs about $35,000, Falco said. Fortune 1000 firms pay $3,995 a node, he added. A startup firm version is available for $20,000.

Basho is a little-known NoSQL firm that got a high profile CEO, Donald Rippert, the former CTO of Accenture, in June. Basho was formed in San Francisco in 2008 with a senior management team, including Falco, of veterans from Akamai Technologies, the content distribution network. Falco is the former VP of product management at Akamai, and prior to that, Akamai VP of technical services.

The firm is backed by the venture capital firms Georgetown Partners and Trifork, which provided a second round of $7.5 million in February.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.