Home

MapR's Google Deal Marks Second Big Data Cloud Win

Comments | Doug Henschen, InformationWeek | July 03, 2012 10:42 AM


Google I/O: 10 Awesome Visions
Google I/O: 10 Awesome Visions
(click image for larger view and for slideshow)
June was a good month for Hadoop software distributor MapR, landing not one, but two high-profile deals to provide the software for Hadoop services in the cloud.

MapR's latest deal is tied to Google's big June 28 announcement of the Google Compute Engine, new infrastructure-as-a-service (IaaS) that sets up the search giant as a public-cloud rival to Amazon Web Services (AWS). MapR is one of at least six partners debuting services on the Google infrastructure, which is currently in limited beta release. MapR and Google are currently signing up customers to join a private preview of the Hadoop services that will run on Google Compute Engine.

News of the Google partnership came just two weeks after MapR and Amazon announced that services based on its M3 and M5 Hadoop software distributions would be available on AWS. Where Amazon's own Elastic MapReduce service runs on Apache Hadoop, the MapR-based services add high-availability features not yet supported on standard open source software.

A key appeal of the AWS and Google services will likely be the ability to process and analyze data that already resides in the cloud. The MapR-based services on AWS, for example, are integrated with Amazon's Simple Storage Service (S3) and DynamoDB NoSQL database. Google AdWords and Google (Web) Analytics are both potentially rich, high-volume sources of search and click-stream data that Google Compute Engine customers could presumably tap without costly and time-consuming data-integration and data-movement steps.

"The big challenges in media are figuring out who to target, when to target, appropriate price points, and appropriate keyword bids, so you could easily see related digital media and advertising analyses performed on Google's cloud," MapR VP of marketing Jack Norris told InformationWeek.

[ Want more on Google's new public cloud infrastructure? Read Google Compute Engine: Hands-On Review. ]

By tapping compute capacity on demand, customers could potentially save money if they experience peaks and valleys in capacity utilization. In a test of Google Compute Engine performance, Norris said MapR recently tested its beta Hadoop service by setting up a 1,256-node cluster and running an industry-standard benchmark terasort job. The cloud-based system completed the job in one minute and 20 seconds, according to Norris, whereas the world record is one minute and two seconds.

"The record was set on a system that had twice as many cores, four times the number of disks, 200 more servers than the system we put together on the Compute Engine, and the cost of the infrastructure was in the neighborhood of $5 million," Norris said. "For the test that we ran on the Google Compute Engine, the cost would be about $16."

Comparable tests of MapR-based Hadoop clusters have not been performed on Amazon's infrastructure, Norris said. In the case of AWS, companies use the S3 services for everything from Web logs and click-through data to genomics data, and they use Amazon Elastic MapReduce and MapR-based Hadoop for analytics.

"The cloud is also an excellent target for business continuity, so instead of having a complete second data center, you can use run Hadoop clusters in the cloud, with mirroring synchronized between your on-premises and cloud-based targets," Norris said.

Some analysts say clould-based services will be prohibitively expensive for long-term storage at high scale, making them most attractive for pilot tests, brief projects, and cases where the data already exists in the cloud (as in the case of Google AdWords, Google Analytics, AWS S3, and DynamoDB). Norris took exception to that analysis.

"I think we're going to see generations of cloud services, and [costs at scale] are not going to be as much of a factor in the future," Norris said.

MapR distinguishes itself from Hadoop software distribution and support competitors Cloudera and Hortonworks by providing high-performance options not supported on standard Apache open source Hadoop software. MapR's M5 distribution, for example, replaces the Hadoop Distributed File System (HDFS) with a derivative of the Unix-based Network File System. M5 includes snapshotting, mirroring, and other high-availability features that aren't currently supported on the current (1.0) Hadoop code line.

MapR describes the AWS and Google services based on its distributions as an endorsement of its architecture, but there are plenty of options to run Cloudera and Hortonworks in the cloud. Hortonworks is the developer of the software used to run Hadoop on Microsoft's Azure public cloud. And multiple providers run Hadoop services on AWS and other public clouds using Cloudera's CDH Hadoop software distribution.

Responding to requests for comment on MapR's recent deals, Cloudera VP of product, Charles Zedlewski, said is a statement, "Cloudera has led the industry in support for Apache Hadoop on public clouds, supporting Rackspace, AWS, and Softlayer dating back to 2009. Every month, tens of thousands of CDH instances are created on top of various public cloud providers."

Zedlewski also noted that Cloudera developed Apache Whirr, software now used by Cloudera and its competitors to run Hadoop distributions on public clouds.

The entire Hadoop movement was actually inspired by Google, which was a pioneer in the use of MapReduce processing and published the white paper that guided the creators of Hadoop. Google still uses MapReduce processing extensively internally, but its software is not distributed and its approach to MapReduce is not made available as a service on the Google Compute Engine.

Pricing and service details have not been finalized for MapR's services on the Google Compute Engine. Basic compute pricing on the Compute Engine starts at $0.145 per hour for a single core with 3.75 gigabytes of memory. See our hands-on review of the Google Compute Engine private beta.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

COMMENTS

Tune In to BYTE
Facebook Twitter LinkedIn Newsletter RSS
Whitepapers
whitepaper
In this paper you will learn the five trends shaping the future of enterprise mobility. Learn how the rise of social media as a business application, the lurring between work and home, the emergence of new mobile devices, the demand for tech savvy employees and changing expectations of corporate IT will fundamentally change the workplace.
whitepaper
In a survey of more than 1,700 information workers (iWorkers) in North America, notebooks, desktops, and smartphones were found to be “must-have” devices, while tablets, slates, and netbooks were relegated to “nice-to-have” status, according to a commissioned study conducted by Forrester Consulting on behalf of Dell and Intel.
Sponsored by: Dell
Upcoming Events