A recently published paper sheds light on an important but seldom discussed Google storage system.
Google's success owes a lot to its computing infrastructure. The company's accomplished engineers have developed and deployed innovations like MapReduce, a way to process large data sets, BigTable, a distributed storage system, Sawzall, an interpreted programming language for analyzing large distributed data sets, the Google File System, a distributed file system, and Google Workqueue, a distributed query management system.
To this list, add Megastore, the storage system that supports Google App Engine, among other applications. Megastore has been used for several years at Google. It was discussed at the SIGMOD 2008 conference but information about the technology has only recently been published, in conjunction with last month's Conference on Innovative Data Systems Research (CIDR).
"Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability," the paper states. "We provide fully serializable ACID semantics within fine-grained partitions of data."
Web applications today, the paper says, have to be highly scalable, have to compete for users through rapid development, have to be responsive in terms of latency, have to provide users with data consistently -- no spreadsheets vanishing into the cloud -- and have to be available at all times.
"These requirements are in conflict," the paper states. "Relational databases provide a rich set of features for easily building applications, but they are difficult to scale to hundreds of millions of users. NoSQL datastores such as Google's Bigtable, Apache Hadoop's HBase, or Facebook's Cassandra are highly scalable, but their limited API and loose consistency models complicate application development. Replicating data across distant data centers while providing low latency is challenging, as is guaranteeing a consistent view of replicated data, especially during faults."
Having dismissed traditional RDBMS (relational database management system) and open source databases like MySQL, the paper also knocks "expensive commercial database systems like Oracle [which] significantly increase the total cost of ownership in large deployments in
Megastore is designed to replicate file write operations synchronously across a wide-area network with reasonable latency and support for graceful failover across data centers. It aims to strike a middle ground between the scalability of NoSQL databases and the convenience of a traditional RDBMS.
James Hamilton, a VP and distinguished engineer at Amazon.com, has noted the limited public information about Megastore in several personal blog posts over the years and expressed qualified admiration for the technology when Google's paper was published. "Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a 'few per second' is limiting," he wrote.
The paper states that over 100 production applications use Megastore as their storage service and that most of Google's customers see availability of 99.999% or higher for these applications. The average read latency for these applications is in the tens of milliseconds range and average write latency ranges from 100 to 400 milliseconds, depending on data center distance and the size of the write operation.
Google declined to comment, preferring to let the paper speak for itself.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.