Practical Disaster Recovery For Midsize Companies
In this InformationWeek Analytics report, we profile new technologies that can protect the business without breaking the bank.
Disasters happen, and when they do, IT had better be prepared, since businesses depend on information and the technology that manages it. For midsize companies, however, planning and equipping for disasters has been problematic. Where large enterprises have an array of specialized disaster-recovery systems from which to choose, and small businesses often can make do with ad hoc measures, midsize companies frequently have been caught in the middle--not able to afford big-bucks systems, yet needing more than just sending tapes off-site.
Whether it's the lingering memory of disasters like Hurricane Katrina, or a recent brownout that sent servers belly-up, or regulatory requirements requiring signatures from the top brass, CEOs are increasingly aware of the problem. At midsize companies, 79% of respondents to our InformationWeek Analytics survey say they have a business-continuity plan in place, and 42% cite lack of C-level awareness as a barrier to disaster-recovery planning. However, companies vary widely in how quickly they can recover: 35% say they'd be back in a few days; 28% say it would be a few weeks.
Increasing midsize business preparedness means using new technologies that are changing the game for disaster recovery, but it also requires spending time classifying applications and managing expectations for the recovery process. The fundamental issues are which apps are required to run the business, how much data can be lost, and how long you can wait to restore functionality. (For a more extensive discussion, see our full report, it includes discussions of distance issues, critical personnel matters, and more.)
For example, a CAD system might be very important to a business, but perhaps nightly tape backups and a week to rebuild servers is sufficient in the face of a flood or tornado. Engineers can re-create any work that's lost, and after a local disaster, they'll be attending to their families' needs before digging back into the company's latest designs. Geography also can make a big difference. A company with engineers in more than one location will prefer to replicate data between the groups, and while a disaster will certainly reduce overall productivity, little information will be lost.
CRITICAL METRICS TO MANAGE
Two key statistics to plan around are the recovery point objective and the recovery time objective, or RPO and RTO. The RPO speaks to how much data is acceptable to be irretrievably lost. In the CAD example above, as much as a day's worth of data may be lost, since backups are only done nightly. For an e-mail system, a company might be willing to lose only an hour's worth of data, and for a transaction-processing system, it might not be acceptable to lose any data.
Recovery time objective speaks to the time required to bring a system back up. While it may be OK to let the CAD system wait a week, the e-mail system is critical. Here, the system itself actually has a different RTO than does its data. Employees can get lots of useful work done if e-mail's running--two-thirds of companies have messaging in their disaster plans--even without back e-mail files. Transaction-processing systems may need to be back in operation in minutes.
IT teams can't determine RPO and RTO of the systems in a vacuum; they need to survey users and consult line-of-business managers and executives on the value of lost time and data. The different RPOs and RTOs must be considered when designing an application, not just in how it's backed up. For instance, a transaction-processing e-commerce system with very low RPOs and RTOs might be best run in a co-location facility that has the redundant Internet and power grid connections, generators, and adequate security, items many midsize businesses can't afford in their own data centers. Some midsize companies, particularly in places such as the San Francisco area and the Gulf Coast, use co-location facilities for part of their data center needs and use their local sites as the disaster-recovery backup.
If companies want to reduce RPOs from hours to minutes, they need to replicate write requests in real time to a duplicate store at their disaster-recovery sites. There are various replication products available, each designed to provide the best balance of cost, bandwidth requirements, RPO, and ease of management for a given set of uses. There are three commonly used data replication methods: synchronous, asynchronous, and snapshot.
Synchronous replication systems duplicate each write request, sending to both the primary and secondary data store. They wait for both stores to complete the write before considering the transaction fully processed. This brings RPO to zero, but the cost is high, and--if not designed properly--such products can substantially slow application performance. These systems typically are used by large organizations and employ dark fiber or some other high-performance network.
Asynchronous replication also drives RPO toward zero, but does so without waiting for every write to be acknowledged from both local and remote data stores. It often includes management and optimization techniques to minimize bandwidth use.
Snapshot technologies work at the storage level, transferring modified blocks of data at set intervals. These systems are more bandwidth-friendly and often work without application awareness. While SAN-based systems are the gold standard for data replication, they often cost more than midsize organizations can afford.
Increasingly, host-based and appliance-based replication, or systems based on server virtualization, are attractive to midsize companies. Not only is the price lower, but virtualization can greatly reduce the time and equipment needed to bring up a disaster recovery site.
>> See the full report at data-protection.informationweek.com <<
About the Author
You May Also Like