Google Adds Big Data Services To Cloud Platform - InformationWeek
IoT
IoT
Data Management // Big Data Analytics
News
8/13/2015
10:15 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%
RELATED EVENTS
Building Security for the IoT
Nov 09, 2017
In this webcast, experts discuss the most effective approaches to securing Internet-enabled system ...Read More>>

Google Adds Big Data Services To Cloud Platform

Google brings its own Cloud Dataflow and Cloud Pub/Sub big data services to Compute Engine and App Engine.

10 Google Milestones: From Stanford Dorm To Alphabet
10 Google Milestones: From Stanford Dorm To Alphabet
(Click image for larger view and slideshow.)

Google announced the general availability of two big data products formerly in beta: Google Cloud Dataflow and Google Cloud Pub/Sub. The two tools complete Google's plan to bring its entire suite of internal big data tools into general availability.

Cloud Data Flow is a Google service for streaming big data on Google Compute Engine and App Engine without incurring the operational overhead of managing a large server cluster. Cloud Pub/Sub integrates applications and services with real-time analysis of data streams.

The two products join Google's existing BigQuery SQL-query based system for analyzing large data streams and data sets.

Adding Cloud Data Flow and Cloud Pub/Sub puts Google on a more equal footing with Amazon Web Services, which has proven light on its feet when it comes to introducing new cloud services. Google Cloud Data Flow has a rough counterpart in Amazon's existing Data Pipeline, Google Cloud Pub/Sub with Amazon Kinesis, and Google BigQuery with Amazon DynamoDB. Amazon also has a Hadoop-type service with Elastic MapReduce.

[Want to learn more about Amazon's big data products? See How Amazon Kinesis Adds Speed, Resilience To Analytics.]

Google's announcement said the two new services are based on a decade of investment in data handling, including MapReduce for simple data processing on large clusters, FlumeJava's parallel data pipelines, and Millwheel's fault-tolerant, large-data-stream processing.

(Image: matdesign24/iStockphoto)

(Image: matdesign24/iStockphoto)

In addition, Google is offering some of the lessons it's learned from its in-house data handling in the new products. Cloud Dataflow "is specifically designed to remove the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model," the Google announcement said. Dataflow is fault tolerant, highly available, and backed by a Google SLA.

The Cloud Dataflow service is two to three times faster than Hadoop when evaluated against classic MapReduce-based pipelines, such as Google PageRank and WordCount, the announcement said. In the cloud, optimized performance means less time spent on the compute servers, leading to lower charges, it said.

Google pointed out that the Cloud Dataflow SDK includes connectors to Salesforce, Clearstory, Tamr, SpringML, Cloudera, and Data Artisans. Cloudera's Director 1.5, now integrated with Google Cloud Platform, became available Wednesday, Aug. 12, as well. Cloudera's Hadoop platform is now certified to run on Google's Compute Engine and App Engine, so users may run Hadoop clusters with Cloudera enterprise Hadoop software.

The Cloud Pub/Sub service is meant to allow a cloud system to deliver multiple messages to large numbers of users at high speeds. Instead of a hard-wired, one-to-one queue, Pub/Sub allows a message to be "fanned-out" to many subscribers at the same time, or multiple publishers to "fan-in" many messages at the same time. If recipients are not online when the message is sent, they will get it as soon as they log back in, Google said in the announcement.

Pub/Sub, Cloud Dataflow, and other data services will be offered from Google data centers around the globe.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Gigi3
100%
0%
Gigi3,
User Rank: Ninja
8/18/2015 | 6:24:22 AM
Re: Google Cloud Dataflow and Google Cloud Pub/Sub
"Google has made some valuable additions to its cloud portfolio as they continue to emphasize performance and scale. Google Cloud Dataflow can possibly replace Hadoop, as it enables the fusion of disparate sources of data in a single processing pipeline. On the other hand, Google Cloud Pub/Sub helps in managing data streams in real time."

RishabhSoft, can you explain how Google Datatflow is superior to Hadoop? What are its major attraction and drawbacks.
RishabhSoft
50%
50%
RishabhSoft,
User Rank: Strategist
8/17/2015 | 3:55:21 AM
Google Cloud Dataflow and Google Cloud Pub/Sub

Google has made some valuable additions to its cloud portfolio as they continue to emphasize performance and scale. Google Cloud Dataflow can possibly replace Hadoop, as it enables the fusion of disparate sources of data in a single processing pipeline. On the other hand, Google Cloud Pub/Sub helps in managing data streams in real time.


The launch of both these services is going to strengthen Google's portfolio of cloud-based focused data analysis tools and complement Google BigQuery, the company's commercial service that processes large sets of unstructured data.


It would be interesting to see how Google leverages these new platforms going forward!

CharlesB21101
50%
50%
CharlesB21101,
User Rank: Strategist
8/13/2015 | 6:11:36 PM
Google Big Data lineage
Google was an innovator in Big Data before the term had come into being through its creation of BigTable in 2004. MapReduce was created in part to load and modify data in BigTable, and MapReduce of course lead eventually to Hadoop. A version of BigTable finally became public May 6, one sign of how slow Google has been to get some of its innovations into the public sphere and gain the credit and stature for having done so. BigTable underlles Google Datastore, announced as part of Google's Cloud Platorm at Google I/O in May 2013.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll