Practical Disaster Recovery For Midsize Companies

In this InformationWeek Analytics report, we profile new technologies that can protect the business without breaking the bank.

Howard Marks, Network Computing Blogger

December 18, 2008

5 Min Read
InformationWeek logo in a gray background | InformationWeek

Disasters happen, and when they do, IT had better be prepared, since businesses depend on information and the technology that manages it. For midsize companies, however, planning and equipping for disasters has been problematic. Where large enterprises have an array of specialized disaster-recovery systems from which to choose, and small businesses often can make do with ad hoc measures, midsize companies frequently have been caught in the middle--not able to afford big-bucks systems, yet needing more than just sending tapes off-site.

Whether it's the lingering memory of disasters like Hurricane Katrina, or a recent brownout that sent servers belly-up, or regulatory requirements requiring signatures from the top brass, CEOs are increasingly aware of the problem. At midsize companies, 79% of respondents to our InformationWeek Analytics survey say they have a business-continuity plan in place, and 42% cite lack of C-level awareness as a barrier to disaster-recovery planning. However, companies vary widely in how quickly they can recover: 35% say they'd be back in a few days; 28% say it would be a few weeks.

Increasing midsize business preparedness means using new technologies that are changing the game for disaster recovery, but it also requires spending time classifying applications and managing expectations for the recovery process. The fundamental issues are which apps are required to run the business, how much data can be lost, and how long you can wait to restore functionality. (For a more extensive discussion, see our full report, it includes discussions of distance issues, critical personnel matters, and more.)

For example, a CAD system might be very important to a business, but perhaps nightly tape backups and a week to rebuild servers is sufficient in the face of a flood or tornado. Engineers can re-create any work that's lost, and after a local disaster, they'll be attending to their families' needs before digging back into the company's latest designs. Geography also can make a big difference. A company with engineers in more than one location will prefer to replicate data between the groups, and while a disaster will certainly reduce overall productivity, little information will be lost.

CRITICAL METRICS TO MANAGE
Two key statistics to plan around are the recovery point objective and the recovery time objective, or RPO and RTO. The RPO speaks to how much data is acceptable to be irretrievably lost. In the CAD example above, as much as a day's worth of data may be lost, since backups are only done nightly. For an e-mail system, a company might be willing to lose only an hour's worth of data, and for a transaction-processing system, it might not be acceptable to lose any data.

Recovery time objective speaks to the time required to bring a system back up. While it may be OK to let the CAD system wait a week, the e-mail system is critical. Here, the system itself actually has a different RTO than does its data. Employees can get lots of useful work done if e-mail's running--two-thirds of companies have messaging in their disaster plans--even without back e-mail files. Transaction-processing systems may need to be back in operation in minutes.

IT teams can't determine RPO and RTO of the systems in a vacuum; they need to survey users and consult line-of-business managers and executives on the value of lost time and data. The different RPOs and RTOs must be considered when designing an application, not just in how it's backed up. For instance, a transaction-processing e-commerce system with very low RPOs and RTOs might be best run in a co-location facility that has the redundant Internet and power grid connections, generators, and adequate security, items many midsize businesses can't afford in their own data centers. Some midsize companies, particularly in places such as the San Francisco area and the Gulf Coast, use co-location facilities for part of their data center needs and use their local sites as the disaster-recovery backup.

chart: What are the top barriers to adoption of a business continuity plan?

If companies want to reduce RPOs from hours to minutes, they need to replicate write requests in real time to a duplicate store at their disaster-recovery sites. There are various replication products available, each designed to provide the best balance of cost, bandwidth requirements, RPO, and ease of management for a given set of uses. There are three commonly used data replication methods: synchronous, asynchronous, and snapshot.

Synchronous replication systems duplicate each write request, sending to both the primary and secondary data store. They wait for both stores to complete the write before considering the transaction fully processed. This brings RPO to zero, but the cost is high, and--if not designed properly--such products can substantially slow application performance. These systems typically are used by large organizations and employ dark fiber or some other high-performance network.

Asynchronous replication also drives RPO toward zero, but does so without waiting for every write to be acknowledged from both local and remote data stores. It often includes management and optimization techniques to minimize bandwidth use.

Snapshot technologies work at the storage level, transferring modified blocks of data at set intervals. These systems are more bandwidth-friendly and often work without application awareness. While SAN-based systems are the gold standard for data replication, they often cost more than midsize organizations can afford.

Increasingly, host-based and appliance-based replication, or systems based on server virtualization, are attractive to midsize companies. Not only is the price lower, but virtualization can greatly reduce the time and equipment needed to bring up a disaster recovery site.

chart: What parts of your IT infrastructure are covered by your business continuity or disaster recovery plan? >> See the full report at data-protection.informationweek.com <<

About the Author

Howard Marks

Network Computing Blogger

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.

He has been a frequent contributor to Network Computing and InformationWeek since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of Networking Windows and co-author of Windows NT Unleashed (Sams).

He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.  You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights