Google Adds To BigQuery Big Data Capabilities

Google expands the capabilities of its BigQuery system to allow real-time data stream processing and event analysis.

Charles Babcock, Editor at Large, Cloud

April 20, 2015

3 Min Read
<p align="left">(Image: Google)</p>

8 Google Projects To Watch in 2015

8 Google Projects To Watch in 2015

8 Google Projects To Watch in 2015 (Click image for larger view and slideshow.)

Google has announced updates to Google BigQuery and Cloud Dataflow -- the search giant's two big data management systems that compete with Amazon Web Services' DynamoDB and Data Pipeline.

In a blog, Google's William Vambenepe, lead product manager for big data on Google's Cloud Platform, claimed Google has implemented a more thorough "cloud way" to managing big data than other IaaS providers. By that Vambenepe means the service is provided without the user needing to know anything about how it's deployed, scaled, or managed, making it a "NoOps" service.

In one update to BigQuery, Google has introduced row-level permissions, a finer-grained approach to granting access to data in a database, according to Vambenepe. With row-level permissions, it's possible to grant a user access to a particular type of data in a database without opening up neighboring data to inspection.

Row-level permissions make it easier to share internal data with a variety of users. Partners or other parties outside the company can be granted permission to access a BigQuery data set in the cloud, but still be restricted to specific rows, Vambenepe wrote in his April 16 blog post. 

[Want to learn more about BigQuery competitors? See MongoDB Eyes Bigger, Faster NoSQL Deployments.]

The default ingestion limit for BigQuery has been raised to 100,000 rows per-second, per-table with unlimited storage for handling large data analysis tasks. BigQuery works with large structured data sets for SQL analytics similar to a relational database system, or with loosely structured data assembled as JSON (JavaScript Object Notation) objects.

Several NoSQL systems, such as Cassandra and MongoDB, also work with JSON objects.

The Google Cloud Platform also introduced the beta version of a new service, Google Cloud Dataflow. Cloud Dataflow provides event/time-based data stream processing, available as an on-demand service. Stream processing can also be scheduled as a batch service, if the Google Cloud user choses.

A Cloud Dataflow user doesn't need to set up a cluster on which to run the stream-flow processing.

"Just write a program, submit it, and Cloud Dataflow will do the rest," Vambenepe wrote.

Stream processing and event-related processing are done on a data stream, such as a feed of stock trades from an exchange, with the system looking for trades at a particular level of pricing, or at particular time intervals. Stream processing can also be used against an application's server log, where it watches for particular software events in the application and triggers an alert when it spots one.

Google's BigQuery processing and Cloud Dataflow stream analysis are now connected to another service -- Cloud Pub/Sub -- to allow notice of event occurrence to selected IT administrators or business end-users. Vambenepe wrote that Cloud Pub/Sub "completes the platform's end-to-end support for low-latency data processing."

Open source data systems, such as Hadoop, Spark, and Flink's data stream processing capabilities may be used with BigQuery as well, Vambenepe wrote. Google will provide connectors between those systems and its BigQuery and Cloud Storage services.

"Scuba equipment helps humans operate under water," observed Vambenepe, but they're no match for the agility of creatures that belong in the water. "When it comes to big data and the cloud, be a dolphin, not a scuba diver," he concluded.

Attend Interop Las Vegas, the leading independent technology conference and expo series designed to inspire, inform, and connect the world's IT community. In 2015, look for all new programs, networking opportunities, and classes that will help you set your organization’s IT action plan. It happens April 27 to May 1. Register with Discount Code MPOIWK for $200 off Total Access & Conference Passes.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights