The 2.0 version, released Tuesday, has gained "three knobs" that a database administrator can turn to achieve durability and performance tradeoffs, said Fred Holahan, chief marketing officer, in an interview. Through a new feature, Command Logging, the system becomes a highly durable, highly recoverable production system. At the direction of the database administrator, it blocks up sets of transaction commands and writes them to disk, while periodically taking snapshots of the data. Intervals between writes would be short, such as 100 milliseconds, allowing 10-20 complex transactions to be gathered up and stored, Holahan explained.
In the event of a complete system failure in, say, a data center fire, the system will automatically resurrect itself from the commands and data snapshots that it retrieves from disk, with only the last 10-20 transactions lost. Such an approach adds a 5-10% performance overhead to normal transaction processing, so maximum performance takes a hit, he noted.
By shortening the interval below 100 milliseconds, fewer transactions will be lost, but an additional performance penalty will be incurred. By lengthening it, performance will improve but the exposure to data loss is greater, reducing the previous, nearly 100% data durability.
The system is the brainchild of Michael Stonebraker, the former University of California at Berkeley database guru, who is now an adjunct professor at MIT. VoltDB comes out of the H-Store research project at MIT, Yale, and Brown universities.
The system is designed for large-scale Web operations, such as social networking games, financial trading, digital ad serving, or telecommunications applications. In some cases, NoSQL systems stand in for relational databases in some of these applications, but VoltDB is designed to maintain transaction consistency and integrity, while NoSQL systems often rely on "eventual" transaction consistency, or a fraction of a second when data lacks integrity because the latest updates haven't been applied.
VoltDB is both an in-memory and a distributed system, eliminating calls to disk by relying on the database system and the data to be available in memory. VoltDB makes use of server RAM in a cluster to achieve its results. NoSQL systems are also distributed and capable of operating on large masses of data, but VoltDB is still following the rules of relational database systems. The 1.0 version was launched in May 2010.
In the 2.0 version, a query planner distributes query workloads across the server cluster, allowing queries to execute on the node where the data needed is stored. The optimization improves the performance of realtime analytic applications, Holahan said.
The 2.0 version also has the ability to stream data to another datastore, such as Hadoop or a relational OLAP system. To avoid impedance mismatches, where another system can't ingest the data as fast as VoltDB can dispense it, VoltDB can write data to an "overflow" disk, where it can be retrieved and delivered when the target system is ready.
VoltDB is available both in a community edition as GPL open source code, and as a commercially supported, Enterprise edition for a $15,000 a year subscription per four-server cluster. Only the enterprise edition includes Command Logging. Version 2.0 can run on a cluster with up to 39 servers and 300 cores.
Network Computing has published an in-depth report on deduplication and disaster recovery. Download the report here (registration required).