The problem of limited bandwidth has created an ideal role for tape in the cloud. Tape can be used to overcome the initial seeding of data and full server recovery. The "bandwidth" of a box of tapes overnighted dwarfs even the fastest Internet connection. Despite this, some cloud vendors claim to have solved the problem and are adding their voices to the "Tape Is Dead" chorus.
Backup vs. Bandwidth
Cloud vendors typically leverage deduplication and compression to minimize the amount of data that is sent across the wire. And it works -- very well. But for deduplication to work, an initial baseline backup needs to be completed. This can take days or even weeks -- during which time you risk data loss.
[ Are space-saving flat file systems the solution to unstructured data storage? Read Object Storage Vs. Overloaded File Servers. ]
Cloud backup vendors' response to this is the increasingly common hybrid deployment. In this scenario, an appliance (physical or virtual) is placed onsite and is allocated storage. Then the most recent backup, plus one full backup, are stored both onsite and in the cloud. With no Internet bandwidth limitations, these hybrid appliances store your first backup on their local storage quickly. Concurrently, they also replicate that initial backup to the cloud -- so although it might take a week to get your data offsite, a second copy of your data is safely tucked away.
The problem is that the second copy of data is not offsite. If during that week of initial replication you have a site disaster or the storage in the appliance fails, you've lost data -- potentially forever. Since most hybrid systems don't have the ability to create a portable copy of data that you can take offsite, that's a high-risk week, and IT pros are under enough stress as it is.
The simple addition of a tape device to the appliance would alleviate this risk. Once the tape copy is made, you could then take it offsite. You could even use the tape to "seed" the cloud copy by overnighting them to the provider – and instead of a week-long sync, you're done in 24 hours.
This initial seed is just part of the problem, and some users may be willing to risk a week's worth of exposure to potential data loss. But the bigger challenge is recovery -- deduplication won't do you any good in a recovery situation because if the server or site fails, anything that deduplication would use to optimize itself is gone.
Some cloud backup providers claim that in-cloud recovery is the answer. What do you think? We'll discuss that problem in my next column.
Learn more about cloud backup by attending the Interop conference track on Infrastructure in New York from Sept. 30 to Oct. 4.