The Role of Disk in Backup
Most day-to-day restore requests do not involve recovery of complete servers or even virtual machines, but rather a file or two that was accidentally deleted or saved over. This is an ideal use case for disk. It's perfect when you need quick access to a very small piece of data.
Restores that do involve recovery of a complete server or virtual machine are less common, but more critical. In these cases, restores--even from disk--may be too slow. The current generation of virtualized focused backup utilities now starts the virtual image directly from the backup disk, without data transfer.
[ How do you overcome deduplication-related performance problems? Read Deduplication Performance: More Than Processing Power. ]
Another area that disk has an advantage is off-site replication. Modern backup applications and disk appliances that implement deduplication can leverage that technology to replicate data to a remote site or provider. Getting the working set off-site soon after the backup completes is another key advantage.
The Cost of the Last 10%
While 90% of restore requests involve getting only the most recent copy of data, the remaining 10% of requests are what drives up backup costs. These restore requests involve a previous version or versions of a particular file. Such requests require backup storage that is typically 5 to 10 times the size of production storage. (As discussed above, 90% of restore requests come from the most recent copy of that data--the 1X part. The remaining 10% of restores come from the remaining 4X-9X.) This 10% of restore requests typically occur in response to things like legal or research--not disasters. Therefore, the speed at which they need to be started generally not an important factor.
The disk backup appliance industry has invested heavily in making disk able to handle this final set of restore requests. Technology like deduplication and scale-out storage make it possible for disk to complete the task. But just because you can technologically accomplish something, does it make sense to do so?
The Case for Tape
As I will discuss in the webinar The Four Reasons The Data Center is Returning To Tape, tape can lower the overall cost of backup storage as well as increase reliability. Tape is an ideal location for the 4X-9X copies of data that are not stored on disk. With LTO 6, tape capacity will move to 8TBs/cartridge, making the cost and space required to store that remaining set of data very affordable.
What about performance and reliability? As I discussed in my recent article, with LTO 6 tape can be written to at 210 MB/s. The concern for most users is the latency--the time it takes to mount a tape drive. For that 10% of restores, however, the time it takes to get started is not a primary issue, and once tape gets moving, it can restore data at least as fast as disk can.
Reliability issues are more a function of how tape is handled and transported than about the physical media. Tape has constantly been shown to have lower error rates than disk, and disk error rates become substantially higher as the capacity per drive increases. Libraries greatly reduce and can even eliminate media handling.
Instead of attempting an all-in-one solution, it might make sense to design a backup strategy that uses disk to store the most recent copy of data, and tape for everything else. This is especially true in larger data centers where the size of that 9X is measured in petabytes.
The advantage of this strategy is that it lets both technologies do what they do best: Disk handles the front end while tape stores long-term copies. Deduplication on disk can still be leveraged to increase the number of working set copies that are maintained on disk. In this design, disk and tape complement each other. Disk backup appliances that support direct streaming to tape should be able to stream tape drives at LTO6 speeds, so copies to tape can occur very quickly.
Reintroducing tape into the backup process should slow the pace at which you add disk backup appliances since the working set of data grows at a much slower rate than does retained data. It also should allow you to cost-effectively meet demands for increased retention time.
Follow Storage Switzerland on Twitter
Extending core virtualization concepts to storage, networking, I/O, and application delivery is changing the face of the modern data center. In the Pervasive Virtualization report, we discuss all these areas in the context of four main precepts of virtualization. (Free registration required.)