With the move to use of more cloud vendors and the migration of big data to mission critical applications, disaster recovery plans need to be reinvented and perhaps torn down and reconstructed altogether.
What are the new elements of DR that CIOs need to address -- and how can IT avoid the historical pitfall with DR -- not getting DR done because there are too many other projects to do? Here are ten best practices for CIOS to consider in the hybrid cloud computing age:
Develop a hybrid cloud DR strategy that accompanies your hybrid cloud plan. In March, 451 Research surveyed European companies and reported that 80% were using multiple cloud vendors for their IT, but only 66% had formal hybrid computing plans. In a second, InvenioIT survey, it was additionally reported that nearly one-third of companies had no DR plan in place, and that one-third of companies that had a DR plan still felt they were unprepared for a disaster. Consequently, it isn’t much of a stretch to surmise that most companies that have been informally moving apps to the cloud probably have not kept pace with their DR plans. The takeaway for CIOS is that DR plans should be concurrently modified with a DR and failover scenario for each new cloud vendor before the new cloud is added to your IT architecture.
Include DR as a line item in your cloud vendor RFPs. Because the ability to execute disaster recovery and failover is critical to a company’s wellbeing, IT managers should include a disaster recovery plan and test requirement as part of any RFP issued to a cloud vendor. If the vendor can't meet your DR and failover plan and test requirement, you should find another vendor.
Require vendor SLAs on DR. Today, most cloud vendors will issue a standard service level agreement commitment for mean time to recovery, mean time to response, etc. — but as recently as five years ago, many cloud vendors didn’t. As part of your contract with a cloud vendor, insist that the vendor issue you a set of SLAs that the two of you can agree to. On this list of SLAs, there should be an SLA that addresses disaster recovery and the timeframe that the vendor guarantees that your applications and data will be recovered and up and running again. For mission-critical systems, this SLA commitment is vital. If a prospective vendor can't meet your own internal recovery requirements, continue to seek out other vendors.
Adopt a continuous revision strategy for your DR plan. IT is constantly changing, and that change is coming at even greater speed as more apps move to cloud. Rewriting your DR plan to keep pace with IT change is not an easy task for companies. When asked, IT managers will tell you that DR is paramount, but they will also yield to immediate business pressures and slide mission critical projects ahead of DR to the point where DR gets neglected. Last year, I asked 10 different company IT leaders how often they reviewed, revised and tested their DR plans. Only one could tell me that it had been done within the past 12 months. While a CIO can roll the dice and hope that nothing too severe occurs, or rely on internal system know-how and survival strategies to get by if a system goes offline, the task becomes more difficult when systems are moved to the cloud and control is ceded to someone else.
Test DR with your vendors. It is not enough to have a DR SLA agreement with a cloud vendor. Your contract with the vendor should also include an annual test of DR with that vendor. In one case, a CIO colleague told me about a failed DR that his staff had executed with a vendor. “We were using a hybrid computing strategy,” he said. “We hosted our production ERP on premises in our data center, but we had a DR agreement with a cloud vendor that enabled us to go to the cloud if our internal system failed. When we tested the DR, it didn't work, even though we were both using the same hardware. We found that there were some differences in system software versions between the vendor’s configuration and ours — enough so the app wouldn’t failover. We wouldn't have known about this if we hadn't tested.”
Review and revise DR plans with vendors annually. To ensure that your DR plan is current with that of every cloud vendor that you use, arrange for an annual meeting with the vendor to go over DR so that your plan and the vendors’ plans stay in sync.
Ask vendors about multiple failover sites. Due diligence on any cloud vendor should include whether the vendor has multiple data centers in different geographic regions to which it can failover. An optimal strategy that has worked well for companies is to find a cloud vendor that has a local presence near corporate headquarters, and that also has at least one secondary data center in a different geographical location. This lowers the risk of putting all of your eggs in one basket in the event of a natural or other disaster that affects a specific geographical region.
Have your IT audit firm review your DR plan for vulnerabilities. An outside IT audit firm can serve as an extra set of eyes to review your DR plan and identify any exposures you might have in using an assortment of third-party cloud vendors as part of a hybrid IT architecture. The audit firm will come back with a set of action items that you and your vendors can work on.
Include cloud vendors in your corporate risk assessments and board reports. Risk assessments, including those that concern IT, are now standard reporting items for boards of directors. It's likely that the CIO has already briefed the board on the strategic importance of moving more apps to the cloud, along with the need for a hybrid IT architecture. The next logical step is to brief the board on the risks that come with adopting a cloud IT strategy, and how the corporate DR plan is positioned to address these risks.
Don't forget about your DR Public Relations plan. Whether DR planning is done for on-premises systems that companies maintain, or for systems that companies have consigned to the cloud, one of the items that is most often overlooked in IT DR plans is coordination with internal PR/marketing on a DR communications and messaging plan for the board, stakeholders, employees, and customers if a major system goes off line. An IT-PR strategy that is carefully crafted and that relays timely information to all of these concerned parties can do much to maintain confidence in the company and to restore peace of mind if disaster strikes.