A recently published paper sheds light on an important but seldom discussed Google storage system.
Google's success owes a lot to its computing infrastructure. The company's accomplished engineers have developed and deployed innovations like MapReduce, a way to process large data sets, BigTable, a distributed storage system, Sawzall, an interpreted programming language for analyzing large distributed data sets, the Google File System, a distributed file system, and Google Workqueue, a distributed query management system.
To this list, add Megastore, the storage system that supports Google App Engine, among other applications. Megastore has been used for several years at Google. It was discussed at the SIGMOD 2008 conference but information about the technology has only recently been published, in conjunction with last month's Conference on Innovative Data Systems Research (CIDR).
"Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability," the paper states. "We provide fully serializable ACID semantics within fine-grained partitions of data."
Web applications today, the paper says, have to be highly scalable, have to compete for users through rapid development, have to be responsive in terms of latency, have to provide users with data consistently -- no spreadsheets vanishing into the cloud -- and have to be available at all times.
"These requirements are in conflict," the paper states. "Relational databases provide a rich set of features for easily building applications, but they are difficult to scale to hundreds of millions of users. NoSQL datastores such as Google's Bigtable, Apache Hadoop's HBase, or Facebook's Cassandra are highly scalable, but their limited API and loose consistency models complicate application development. Replicating data across distant data centers while providing low latency is challenging, as is guaranteeing a consistent view of replicated data, especially during faults."
Having dismissed traditional RDBMS (relational database management system) and open source databases like MySQL, the paper also knocks "expensive commercial database systems like Oracle [which] significantly increase the total cost of ownership in large deployments in
Megastore is designed to replicate file write operations synchronously across a wide-area network with reasonable latency and support for graceful failover across data centers. It aims to strike a middle ground between the scalability of NoSQL databases and the convenience of a traditional RDBMS.
James Hamilton, a VP and distinguished engineer at Amazon.com, has noted the limited public information about Megastore in several personal blog posts over the years and expressed qualified admiration for the technology when Google's paper was published. "Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a 'few per second' is limiting," he wrote.
The paper states that over 100 production applications use Megastore as their storage service and that most of Google's customers see availability of 99.999% or higher for these applications. The average read latency for these applications is in the tens of milliseconds range and average write latency ranges from 100 to 400 milliseconds, depending on data center distance and the size of the write operation.
Google declined to comment, preferring to let the paper speak for itself.
Google in the Enterprise SurveyThere's no doubt Google has made headway into businesses: Just 28 percent discourage or ban use of its productivity products, and 69 percent cite Google Apps' good or excellent mobility. But progress could still stall: 59 percent of nonusers distrust the security of Google's cloud. Its data privacy is an open question, and 37 percent worry about integration.
Top IT Trends to Watch in Financial ServicesIT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Join us for a roundup of the top stories on InformationWeek.com for the week of September 18, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."