Big Data. Big Decisions
InformationWeek
Special Coverage Series


MapR's Google Deal Marks Second Big Data Cloud Win

Just two weeks after inking a deal with Amazon Web Services, MapR gets an exclusive to run Hadoop services on the Google Compute Engine.

Google I/O: 10 Awesome Visions
Google I/O: 10 Awesome Visions
(click image for larger view and for slideshow)
June was a good month for Hadoop software distributor MapR, landing not one, but two high-profile deals to provide the software for Hadoop services in the cloud.

MapR's latest deal is tied to Google's big June 28 announcement of the Google Compute Engine, new infrastructure-as-a-service (IaaS) that sets up the search giant as a public-cloud rival to Amazon Web Services (AWS). MapR is one of at least six partners debuting services on the Google infrastructure, which is currently in limited beta release. MapR and Google are currently signing up customers to join a private preview of the Hadoop services that will run on Google Compute Engine.

News of the Google partnership came just two weeks after MapR and Amazon announced that services based on its M3 and M5 Hadoop software distributions would be available on AWS. Where Amazon's own Elastic MapReduce service runs on Apache Hadoop, the MapR-based services add high-availability features not yet supported on standard open source software.

A key appeal of the AWS and Google services will likely be the ability to process and analyze data that already resides in the cloud. The MapR-based services on AWS, for example, are integrated with Amazon's Simple Storage Service (S3) and DynamoDB NoSQL database. Google AdWords and Google (Web) Analytics are both potentially rich, high-volume sources of search and click-stream data that Google Compute Engine customers could presumably tap without costly and time-consuming data-integration and data-movement steps.

"The big challenges in media are figuring out who to target, when to target, appropriate price points, and appropriate keyword bids, so you could easily see related digital media and advertising analyses performed on Google's cloud," MapR VP of marketing Jack Norris told InformationWeek.

[ Want more on Google's new public cloud infrastructure? Read Google Compute Engine: Hands-On Review. ]

By tapping compute capacity on demand, customers could potentially save money if they experience peaks and valleys in capacity utilization. In a test of Google Compute Engine performance, Norris said MapR recently tested its beta Hadoop service by setting up a 1,256-node cluster and running an industry-standard benchmark terasort job. The cloud-based system completed the job in one minute and 20 seconds, according to Norris, whereas the world record is one minute and two seconds.

"The record was set on a system that had twice as many cores, four times the number of disks, 200 more servers than the system we put together on the Compute Engine, and the cost of the infrastructure was in the neighborhood of $5 million," Norris said. "For the test that we ran on the Google Compute Engine, the cost would be about $16."

Comparable tests of MapR-based Hadoop clusters have not been performed on Amazon's infrastructure, Norris said. In the case of AWS, companies use the S3 services for everything from Web logs and click-through data to genomics data, and they use Amazon Elastic MapReduce and MapR-based Hadoop for analytics.

"The cloud is also an excellent target for business continuity, so instead of having a complete second data center, you can use run Hadoop clusters in the cloud, with mirroring synchronized between your on-premises and cloud-based targets," Norris said.

Some analysts say clould-based services will be prohibitively expensive for long-term storage at high scale, making them most attractive for pilot tests, brief projects, and cases where the data already exists in the cloud (as in the case of Google AdWords, Google Analytics, AWS S3, and DynamoDB). Norris took exception to that analysis.

"I think we're going to see generations of cloud services, and [costs at scale] are not going to be as much of a factor in the future," Norris said.

MapR distinguishes itself from Hadoop software distribution and support competitors Cloudera and Hortonworks by providing high-performance options not supported on standard Apache open source Hadoop software. MapR's M5 distribution, for example, replaces the Hadoop Distributed File System (HDFS) with a derivative of the Unix-based Network File System. M5 includes snapshotting, mirroring, and other high-availability features that aren't currently supported on the current (1.0) Hadoop code line.

MapR describes the AWS and Google services based on its distributions as an endorsement of its architecture, but there are plenty of options to run Cloudera and Hortonworks in the cloud. Hortonworks is the developer of the software used to run Hadoop on Microsoft's Azure public cloud. And multiple providers run Hadoop services on AWS and other public clouds using Cloudera's CDH Hadoop software distribution.

Responding to requests for comment on MapR's recent deals, Cloudera VP of product, Charles Zedlewski, said is a statement, "Cloudera has led the industry in support for Apache Hadoop on public clouds, supporting Rackspace, AWS, and Softlayer dating back to 2009. Every month, tens of thousands of CDH instances are created on top of various public cloud providers."

Zedlewski also noted that Cloudera developed Apache Whirr, software now used by Cloudera and its competitors to run Hadoop distributions on public clouds.

The entire Hadoop movement was actually inspired by Google, which was a pioneer in the use of MapReduce processing and published the white paper that guided the creators of Hadoop. Google still uses MapReduce processing extensively internally, but its software is not distributed and its approach to MapReduce is not made available as a service on the Google Compute Engine.

Pricing and service details have not been finalized for MapR's services on the Google Compute Engine. Basic compute pricing on the Compute Engine starts at $0.145 per hour for a single core with 3.75 gigabytes of memory. See our hands-on review of the Google Compute Engine private beta.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.