Cockroach Labs announces $6.25 million in funding for its big data system that it says will survive calamity and maintain data integrity.
8 Linux Security Improvements In 8 Years
(Click image for larger view and slideshow.)
"The joke is -- maybe it's not a joke -- that cockroaches will survive World War III," began Spencer Kimball, haltingly, at CoreOS Fest in San Francisco May 5, as he explained what CockroachDB software is all about. "Mainly," he said, "CockroachDB survives."
Kimball is CEO of Cockroach Labs, a company founded in February with experienced engineers from Google. His CoreOS Fest talk was an early airing of the goals of the CockroachDB system. Kimball worked at Google for more than nine years and helped develop its Colossus distributed file system, work that stands him in good stead as he functions as CEO of a team trying to produce a database system that can function around the globe and never go down.
On Thursday, Cockroach Labs is announcing $6.25 million in funding for the nine-employee company after a sustained background effort to get CockroachDB established as a widely accepted open source project. Benchmark Capital leads the Series A round, with Google Ventures participating.
CockroachDB is now "on the cusp of alpha," or release for early, non-production use by developers, Kimball said in advance of the funding announcement.
CockroachDB is open source code that tries to match the characteristics of Spanner, Google's database system for spanning the globe. Spanner makes indexes of Web crawler information instantly available for the Google Search engine. With split-second timing it manages the user lookups and ad servings that accompany individual searches.
(Image: Danil Melekhin/iStockphoto)
Many enterprises would adopt Spanner, if they could. But it's not open source, and it depends on other Google technologies, like Colossus, which are not available for external operations. CockroachDB is an attempt to provide a standalone system that has Spanner's scalability, survivability, and data integrity. Data inside Spanner is consistent around the globe, with updates managed by its own atomic clock system that skips use of the NTP protocol. CockroachDB plans to duplicate Spanner's scalability and survivability, but most of its users won't need their own atomic clock system, so it's skipping that part.
The main goal is to get a distributed database that is highly survivable and maintains precisely synchronized data throughout the system, no matter how broadly it's distributed. For existing big data systems and for most NoSQL systems, data consistency throughout the system is still a distant goal. "Consistency is very important. But consistency is very, very hard," Kimball said.
Data consistency, like that provided by Spanner, is the bugaboo of proliferating NoSQL systems, which can gorge on huge amounts of data. But as database interactions pile up, the guarantee of data consistency declines. Most NoSQL systems boast "eventual consistency," where the results of data writes will eventually catch up with data reads.
That makes transactions a big problem for NoSQL systems. The user can't be sure the information used in an attempted transaction reflects the most recent changes. Precise, data consistency requires assured transactions that finish updating the system before any reads are executed against the target data.
Cockroach is shooting for data consistency across the system, no matter how many locations the database has been propagated to, Kimball said.
On Facebook, it doesn't necessarily matter if the number of "likes" for your picture of granny's baked beans is off by one or two respondents for three seconds. But for financial and other types of transactions, including those that update the database, "eventual consistency" is anathema to accurate operations.
"After you've written something, you should read what you just wrote one or two milliseconds later," but that's not the case with all NoSQL system interactions. "Most NoSQL systems only supply eventual consistency," he said.
CockroachDB, like its namesake, is able to propagate itself without human intervention. If new servers are added to the cluster, it recognizes the fact, propagates data to them, and adds it to its processing operations. Increases in traffic will trigger a horizontal scaling out by CockroachDB. The loss of a server or servers will prompt it to seek additional compute power elsewhere and rebalance the load.
If CockroachDB achieves its objectives, then the problem of database failure and recovery will become a highly infrequent occurrence. CockroachDB spreads its data around in small 32 or 64 MB chunks on different servers, with multiple copies of the database engine knowing where each chunk is. When a large amount of data is needed for processing, it streams in from many sources "using the CPU, memory, and network bandwidth of many nodes" to reduce the data latency. Multiple copies of the data are kept to ensure the loss of one copy won't leave any gaps in the database.
The size of the distributed data chunks may vary by user, Kimball said, but most will keep them in the 32 to 256 MB range for quick movement of data between cluster nodes and between data centers.
If CockroachDB works as planned, it will be a distributed database that is self-mapping and self-balancing. It will also able to recover from a major loss of hardware without disrupting operations. Failover and mean time to recovery will become outmoded terms.
But in addition to that, it will have precisely synchronized data throughout a distributed system. "It will be consistent all the time, a wonderful feature," Kimball said.
Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
The Next Generation of IT SupportThe workforce is changing as businesses become global and technology erodes geographical and physical barriers.IT organizations are critical to enabling this transition and can utilize next-generation tools and strategies to provide world-class support regardless of location, platform or device