Google BigQuery Adds Data Streaming

Developers and businesses can now analyze data as it gets entered into BigQuery.
Google's BigQuery, a Web service for analyzing large amounts of data, is about to become more efficient in order to gain insight into data subsets and to refresh its interface.

On Wednesday, Google plans to introduce several new features: support for real-time data streaming in BigQuery, the ability to query portions of a table, the query functions SUM and COUNT, and interface improvements designed to enhance productivity.

BigQuery was launched last year as a tool for interactive data analysis. It's not a database, like Google Cloud SQL. Rather it brings MySQL-style querying to a NoSQL datastore.

With additional tools, Hadoop clusters can be deployed to query multi-terabyte datasets, but the resulting system probably won't return query results as rapidly.

[ Surfers are voting with their clicks. Read Online Ad Blocking Spreads. ]

Raj Pai, CEO of social analytics company Claritics, said in a Google case study that time-consuming complex queries of large data sets on Hadoop clusters can be processed by BigQuery in as little as 20 seconds. As a consequence, his company has been able to develop apps four times faster and to spend about 40% less time focused on IT infrastructure.

Similar offerings from other companies include Amazon Elastic MapReduce, IBM BigInsights and Microsoft Azure HDInsight.

While BigQuery was designed to perform SQL-like queries on large datasets quickly, speed can still be an issue: It can take a long time to move large amounts of data into the cloud.

That's where real-time data streaming comes in. Developers and businesses can now stream data row-by-row using a new API call. This allows data processing to begin immediately, rather than uploading to a cache for batch processing.

Google is offering streaming ingestion for free until Jan. 1, 2014. Thereafter, streamed data will be billed at a rate of $0.01 per 10,000 rows inserted. Batch-based data ingestion will remain free.

Querying subsets of a table can now be done with the addition of a "table decorator" to an SQL statement. These are limited to data inserted within the last 24 hours.

Beyond the cost benefits of concise queries, Google product manager Ju-kay Kwek said in a blog post that table decorators can be used in conjunction with real-time data streaming to do things like monitor user activity during a recent time period, such as the introduction of a Web app update.

The new SUM and COUNT functions expand BigQuery's statistical capabilities. And BigQuery interaction has been enhanced with an expanding information panel that provides more detail about queries and with action buttons at the bottom of the query box.

Kwek said in an email that Google sees BigQuery being used by a wide range of industries, including e-commerce, retail, logistics and operations. Service partners such as PA Consulting and Saama Technologies have also helped companies in specialized industries like healthcare implement BigQuery.

Kwek said BigQuery and Amazon Elastic MapReduce (EMR) serve different functions. "BigQuery is well suited for businesses who need to analyze large amounts of data in an ad hoc and iterative manner, who can't or don't want to build and manage a lot of technical infrastructure," he said. "EMR is very different; as a general purpose framework for running MapReduce jobs it's powerful and flexible, but requires significant investments in infrastructure and management."

Google doesn't disclose user figures for specific services, but the company says it has 3 million active applications running on its Cloud Platform and about 300,000 unique developers using its services every month.