Big Data. Big Decisions
InformationWeek
Special Coverage Series


MongoDB Upgrade Fills NoSQL Analytics Void

Latest release of 10Gen's database sidesteps complicated MapReduce processing with a new data-aggregation framework. That distances MongoDB from NoSQL rivals including Cassandra, HBase, and Riak.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
10Gen, the company behind the fast-growing MongoDB database, on Wednesday announced the general availability of a highly anticipated upgrade that promises easier analytic querying of a NoSQL database best known for speedy transactional performance.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

The new release, MongoDB 2.2, is the production-ready result of a 2.1 developers' preview that has been beta tested by the MongoDB community since January. Key upgrades include a new real-time aggregation framework, new sharding and replication features for multi-data-center deployments, and improved performance and database concurrency for high-scale deployments.

The biggest news in the upgrade is clearly the new real-time data aggregation framework, which lets users directly query data within MongoDB without resorting to writing and running complicated, batch-oriented MapReduce jobs within the database.

"MapReduce works well when it's a complex analysis that you need to handle with batch processing, but if you're trying to do something simple like compute the average of a list of numbers, it's overkill," explained Jared Rosoff, director of product marketing at 10gen in an interview with InformationWeek.

What was missing before 2.2, and indeed in most NoSQL databases, according to Rosoff, is routine query functionality that lets you handle the kind of data-filtering and data-analysis tasks you would otherwise handle with SQL--that is if you were using a relational database. That's exactly what the data aggregation framework provides: a collection of data operators that can handle 80% of the tasks that MongoDB developers used to handle with MapReduce, according to 10gen.

[ Want more on MongoDB? Read MongoDB NoSQL Database Poised For Takeoff. ]

The MongoDB query language is not SQL, but 10gen describes it as a simple, expressive language with a straightforward syntax for efficient querying. Examples of simple query statements include "sum," "min," "max," and "average." These sorts of operators would be familiar to any database veteran or analyst, and they're applied in a real-time data-processing pipeline that delivers sub-second performance, according to 10gen.

Other available query statements include "project," which is used to select desired attributes and ignore everything else. "Group" lets you combine results with desired attributes. "Match" is a filter than can be used to eliminate documents from a query. "Limit," "skip" and "sort," are statements used in much the same way they're used in SQL: to limit a query to a desired number of results, to skip over a given number of results, and to sort results alphabetically, numerically or by some other value.

SQL veterans might ask, "why not just use a relational database?" Rosoff says MongoDB is displacing products like Oracle Database and Microsoft SQL Server because of its scalability and flexibility. MongoDB runs on low-cost, highly distributed nodes of commodity hardware much like Hadoop, but unlike that data-processing platform, it's a database that can run applications.

Like other NoSQL databases, MongoDB gives users the flexibility to store and recall any type of data without the rigid constraints of a fixed data model--something that relational databases demand. New data types including complex data and loosely structured textual information can be added without first conforming the data to a predefined schema.

"Customers frequently tell us they've spent as long as a year trying to model complicated schemas in relational databases but they just couldn't make it work or perform," Rosoff said. "People are adopting Mongo because every document stored in the database can have slightly different fields, and documents can have more structure than rows in a relational database."

A good use case for NoSQL is modeling a product catalog for an e-commerce site. If that site sells books, shoes, furniture, and MP3s, the catalog will require many different fields to cover diverse product attributes, but at the same time, all of those products have product IDs, prices, and descriptions. That's hard to structure in a relational database, but "you can model that type of data much more simply in Mongo," Rosoff said.

The new aggregation framework promises to fill the need for fast, simple querying in MongoDB, but more complex analyses can still be handled with MapReduce processing within the database. And for really complex data processing and analyses, there's a MongoDB-Hadoop connector that lets users handle those tasks on separate Hadoop clusters.

New multi-data-center support features included in the 2.2. release give administrators tighter control over data location to meet compliance demands. For example, certain privacy regulations in Europe demand that customer data is stored within the country or continent. Tag-aware database sharding and replication features in 2.2 support location-based storage and retention. In addition, different types of data can be assigned to content-appropriate hardware, as in fast storage for frequently accessed data and low-cost options for archival information.

MongoDB 2.2 performance and concurrency is said to be improved with a new locking architecture that 10gen says handles frequent database reads and writes. Locking ensures data integrity by ensuring that one transaction is completed before another can update the same information. By using a more fine-grained locking approach and detecting when data is on disk rather than in RAM, 10gen says Mongo 2.2 handles more disk input and output demands under load without degrading database performance.

The performance gains and multi-data-center support features are table stakes for big data deployments that 10gen had to deliver. The data aggregation framework distances MongoDB from NoSQL competitors including Cassandra, HBase, and Riak, according to Rosoff. Gartner analyst Merv Adrian told InformationWeek he's cautiously optimistic that 10gen will deliver what's promised.

"Time will tell if 10gen's '80% of the use cases' assertion proves out, but there is no doubt that grouping and aggregation functions do make up a lot of the intended [analytic] work in their customer and prospect base," Adrian said.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.