Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.
Google's Spanner Database Offers Global Scale
Spanner, Google's latest database technology, relies on atomic clocks to ensure the consistency of distributed data.
September 19, 2012
4 Min Read
Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends (click image for larger view and for slideshow)
Google has released details of a global distributed database called Spanner, which the company claims is the first system to distribute data globally while supporting consistent distributed transactions.
Google began deploying Spanner in 2011 as part of an effort to rewrite F1, its internal advertising backend. A Spanner deployment, as befits Google's ambitions, is called a universe. The company has three universes: a test/playground universe, a development/production universe, and a production-only universe.
F1 was originally based on a MySQL database that handled tens of terabytes of data. Its initial structure involved a sharding scheme--a way to divide the database--that assigned customer data to a fixed shard. This allowed the use of indexes and sophisticated processing of queries, but proved difficult and costly when resharding--a resource-intensive manual process--was necessary to accommodate database growth.
Because of the complexity of resharding F1 data, Google's engineers began trying to limit the growth of the F1 MySQL database by storing data using Bigtable, a NoSQL data storage technology that the company developed in 2004 and continues to rely on. But Bigtable limited both the transactional data visibility Google needed and the ability to query across the entire data set.
[ What's new with OpenStack? Read OpenStack Poised For Cloud Leadership. ]
Google has always put a premium on the ability to operate at scale and Spanner represents a continuation of that goal.
"Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures," Google's research paper explains. "Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows."
Google fellow Jeff Dean, one of more than two dozen Google engineers credited as co-authors of the paper, described early work on Spanner back in 2009 at the third ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS).
Spanner provides applications with the ability to dynamically control where data resides--data stored close to the user can be delivered faster and data replicas stored near each other can be written to faster--and how many times data is replicated. The system can move data dynamically and transparently between data centers to balance resource usage. What's more, it can read and write data consistently on a global scale--something that is difficult for databases to do.
The key to Spanner is what Google calls the TrueTime API, which allows timestamps to be reliably applied to database operations around the globe. Without sufficiently accurate timestamps, network latency and clock variations can lead to synchronization errors in distributed systems.
To achieve a globally consistent time system, Google has deployed a set of atomic clocks and GPS hardware in every data center. Without redundant time references, Spanner would become unreliable, which isn't what Google wants with its revenue-generating ad system. The paper's authors argue that future distributed systems should no longer accept loosely synchronized clocks.
"This is more or less NoSQL throwing in the towel," said Jim Starkey, CTO of NuoDB, a company that makes a cloud database system with goals similar to Spanner, in a phone interview.
Google pushed MySQL to the breaking point, says Starkey, noting that the company's data was heavily sharded and didn't scale. NoSQL, in the form of Bigtable, wasn't the answer for Google's distributed ad system because it only promises eventual data consistency. When accessing data in real-time, he explained, the information might not be consistent throughout the system. NoSQL trades guarantees of atomicity, consistency, isolation, durability--the ACID properties that ensure data reliability during database operations--for performance in distributed systems at scale.
"Trying to get ACID transactions on Bigtable is a programming nightmare and too difficult for most programmers to do," said Starkey.
This is something Google itself acknowledges in its paper: "[W]e have also consistently received complaints from users that Bigtable can be difﬁcult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication."
Starkey says Spanner is very interesting technology, but he's not sure whether it will be useful for other businesses that don't deal with the same distributed data challenges as Google. Even Google, he suggests, may not end up using it for its external-facing applications like Gmail or Picasa, because those apps don't have the same data access requirements as F1.
"The question is can Google really get the throughput that they need for their applications," said Starkey. "My take on it is that Spanner is quite complicated to configure, install, and manage. But it's really exciting to see another company going out there to deal with databases in a way that validates the assumptions we made."
About the Author(s)
Editor at Large, Enterprise Mobility
Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful master's degree in film production. He wrote the original treatment for 3DO's Killing Time, a short story that appeared in On Spec, and the screenplay for an independent film called The Hanged Man, which he would later direct. He's the author of a science fiction novel, Reflecting Fires, and a sadly neglected blog, Lot 49. His iPhone game, Blocfall, is available through the iTunes App Store. His wife is a talented jazz singer; he does not sing, which is for the best.
You May Also Like
Edge Computing Bridges IT and OT People, Process, and Technology
*Why DDI? Why it is Important to Integrate DNS, DHCP, and IP Address Management in Your Network
Keeping Hackers Off Every Edge
KVM SwitchÂ High Performance Applications with Dominion KX III
Edge Computing 101 Practical Insight for IT and Ops Leaders