Whether it's the lingering memory of disasters like Hurricane Katrina, or a recent brownout that sent servers belly-up, or regulatory requirements requiring signatures from the top brass, CEOs are increasingly aware of the problem. At midsize companies, 79% of respondents to our InformationWeek Analytics survey say they have a business-continuity plan in place, and 42% cite lack of C-level awareness as a barrier to disaster-recovery planning. However, companies vary widely in how quickly they can recover: 35% say they'd be back in a few days; 28% say it would be a few weeks.
For example, a CAD system might be very important to a business, but perhaps nightly tape backups and a week to rebuild servers is sufficient in the face of a flood or tornado. Engineers can re-create any work that's lost, and after a local disaster, they'll be attending to their families' needs before digging back into the company's latest designs. Geography also can make a big difference. A company with engineers in more than one location will prefer to replicate data between the groups, and while a disaster will certainly reduce overall productivity, little information will be lost.
CRITICAL METRICS TO MANAGE
Recovery time objective speaks to the time required to bring a system back up. While it may be OK to let the CAD system wait a week, the e-mail system is critical. Here, the system itself actually has a different RTO than does its data. Employees can get lots of useful work done if e-mail's running--two-thirds of companies have messaging in their disaster plans--even without back e-mail files. Transaction-processing systems may need to be back in operation in minutes.
IT teams can't determine RPO and RTO of the systems in a vacuum; they need to survey users and consult line-of-business managers and executives on the value of lost time and data. The different RPOs and RTOs must be considered when designing an application, not just in how it's backed up. For instance, a transaction-processing e-commerce system with very low RPOs and RTOs might be best run in a co-location facility that has the redundant Internet and power grid connections, generators, and adequate security, items many midsize businesses can't afford in their own data centers. Some midsize companies, particularly in places such as the San Francisco area and the Gulf Coast, use co-location facilities for part of their data center needs and use their local sites as the disaster-recovery backup.
Synchronous replication systems duplicate each write request, sending to both the primary and secondary data store. They wait for both stores to complete the write before considering the transaction fully processed. This brings RPO to zero, but the cost is high, and--if not designed properly--such products can substantially slow application performance. These systems typically are used by large organizations and employ dark fiber or some other high-performance network.
Asynchronous replication also drives RPO toward zero, but does so without waiting for every write to be acknowledged from both local and remote data stores. It often includes management and optimization techniques to minimize bandwidth use.
Snapshot technologies work at the storage level, transferring modified blocks of data at set intervals. These systems are more bandwidth-friendly and often work without application awareness. While SAN-based systems are the gold standard for data replication, they often cost more than midsize organizations can afford.
Increasingly, host-based and appliance-based replication, or systems based on server virtualization, are attractive to midsize companies. Not only is the price lower, but virtualization can greatly reduce the time and equipment needed to bring up a disaster recovery site.
Two key statistics to plan around are the recovery point objective and the recovery time objective, or RPO and RTO. The RPO speaks to how much data is acceptable to be irretrievably lost. In the CAD example above, as much as a day's worth of data may be lost, since backups are only done nightly. For an e-mail system, a company might be willing to lose only an hour's worth of data, and for a transaction-processing system, it might not be acceptable to lose any data.

Stay connected and informed by visiting our Enterprise IT Community!

Become a member today for instant access to free InformationWeek research, expert advice, peer perspectives, and more on the following topics:
- Application Performance Management (APM)
- Security Management
- Mainframe 2.0
- IT Automation
- Service Assurance
Also, visit our Government, Retail and Financial Services groups to see how these technologies apply specifically to those industries.
NOTE: Offer valid for U.S., U.S. possessions, & Canada only.