Google Spills Megastore's Secrets - InformationWeek
Cloud // Cloud Storage
05:48 PM
Connect Directly

Google Spills Megastore's Secrets

A recently published paper sheds light on an important but seldom discussed Google storage system.

Google's success owes a lot to its computing infrastructure. The company's accomplished engineers have developed and deployed innovations like MapReduce, a way to process large data sets, BigTable, a distributed storage system, Sawzall, an interpreted programming language for analyzing large distributed data sets, the Google File System, a distributed file system, and Google Workqueue, a distributed query management system.

To this list, add Megastore, the storage system that supports Google App Engine, among other applications. Megastore has been used for several years at Google. It was discussed at the SIGMOD 2008 conference but information about the technology has only recently been published, in conjunction with last month's Conference on Innovative Data Systems Research (CIDR).

The paper detailing the technology, "Megastore: Providing Scalable, Highly Available Storage for Interactive Services," describes a storage system tailored to modern interactive online services.

"Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability," the paper states. "We provide fully serializable ACID semantics within fine-grained partitions of data."

Web applications today, the paper says, have to be highly scalable, have to compete for users through rapid development, have to be responsive in terms of latency, have to provide users with data consistently -- no spreadsheets vanishing into the cloud -- and have to be available at all times.

"These requirements are in conflict," the paper states. "Relational databases provide a rich set of features for easily building applications, but they are difficult to scale to hundreds of millions of users. NoSQL datastores such as Google's Bigtable, Apache Hadoop's HBase, or Facebook's Cassandra are highly scalable, but their limited API and loose consistency models complicate application development. Replicating data across distant data centers while providing low latency is challenging, as is guaranteeing a consistent view of replicated data, especially during faults."

Having dismissed traditional RDBMS (relational database management system) and open source databases like MySQL, the paper also knocks "expensive commercial database systems like Oracle [which] significantly increase the total cost of ownership in large deployments in the cloud."

Megastore is designed to replicate file write operations synchronously across a wide-area network with reasonable latency and support for graceful failover across data centers. It aims to strike a middle ground between the scalability of NoSQL databases and the convenience of a traditional RDBMS.

James Hamilton, a VP and distinguished engineer at, has noted the limited public information about Megastore in several personal blog posts over the years and expressed qualified admiration for the technology when Google's paper was published. "Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a 'few per second' is limiting," he wrote.

The paper states that over 100 production applications use Megastore as their storage service and that most of Google's customers see availability of 99.999% or higher for these applications. The average read latency for these applications is in the tens of milliseconds range and average write latency ranges from 100 to 400 milliseconds, depending on data center distance and the size of the write operation.

Google declined to comment, preferring to let the paper speak for itself.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
Digital Transformation Myths & Truths
Transformation is on every IT organization's to-do list, but effectively transforming IT means a major shift in technology as well as business models and culture. In this IT Trend Report, we examine some of the misconceptions of digital transformation and look at steps you can take to succeed technically and culturally.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll