The Most Crucial Elements of Any Cyber-Resilience Testing Plan
By taking a methodical approach to resiliency testing, organizations can vastly improve their ability to bounce back quickly and securely.
When it comes to cybersecurity, resilience may prove even more crucial to a business than any mere defense.
Trying to keep bad actors out is important, of course. But ransomware and other forms of attacks are becoming more pervasive -- even inevitable. And the ability to get data systems back online after an incident is just as crucial as trying to detect and defend against an attack in the first place.
But the only way to truly know the readiness and resiliency of the IT environment is to test it. The trouble is, too few companies conduct adequate testing, if they do it all. No amount of preparation can ever handle all contingencies, of course. However, by taking a methodical approach to resiliency testing, organizations can vastly improve their ability to bounce back quickly and securely.
It’s a lot like training a hockey team. At first, the plays may seem chaotic. Players don’t immediately grasp how their individual movements contribute to the team’s overall flow. But by breaking it down into processes, and then perfecting each one through practice, the coordination improves. And increasingly, the team can execute more complex plays.
Understanding the Landscape
When building a testing strategy, companies should map their recovery operations into three segments: the people, the process, and the technology that's involved.
Understanding how they all must work together will lead to a stronger and faster recovery:
People: These should be the roles that are critical to returning to a safe state as quickly as possible. This could include investigators, forensics analysts, and the security response team.
Process: Many companies will have some incident response plan in place. But in the chaos of an incident, these well-thought-out protocols can quickly break down. That’s why testing is so critical. Companies should put the security and IT teams through trial runs, including tabletop exercises. Then, they can pinpoint any troublesome areas. And they can start to detect areas for potential automation.
Technology: Businesses need to know what their most critical IT systems are. These are the programs that, in the event of an attack, natural disaster or other incident, must be back up and running the fastest. Often, these aren’t apparent to the security teams. End users can help pinpoint the most critical apps.
Create a 'Recovery' Role
Today, recovery rarely has its own dedicated specialist. Instead, the role is spread across many different positions. Or there’s no one person actually tasked with ensuring the business can get back online.
That must change. If the business can afford it, hiring a dedicated individual to oversee data recovery is the most effective option. This person should spend their time talking to employees across the business to figure out what IT tools are critical to their jobs. Then, the company can begin to categorize applications by their necessity. Naturally, the more vital the system, the quicker it will need to be restored.
Someone in a dedicated recovery role can also spend more time running through possible scenarios in their head. Then, they can build the right response plans to make recovery much faster.
However, many IT and security teams won’t have the budget for a specialist. In those instances, adding the responsibility of overseeing recovery to someone’s daily job could make a difference. That will mean removing some other, less vital responsibilities from this individual’s workload. Otherwise, they won’t be able to adequately focus on testing the company’s ability to bounce back from an incident.
And small improvements in testing can make a big difference. Even dedicating a few hours to testing could lead to faster and more efficient recoveries. But it all begins with making someone responsible for examining the existing threat landscape, pinpointing the biggest areas of risk, and then establishing the right response plans.
Taking an Iterative Approach
Many IT teams won’t have the money, time, or capability to test their entire environment. Instead, companies should start where they can, then strive to gradually test more.
A business might only be able to test 20% of their environment. That’s why they should start with the most important 20%. Eventually, the company will be able to more seamlessly run these recovery tests. Then, they can gradually add other applications to the mix. Soon, they're testing an increasingly higher portion.
But it’s unlikely enterprises will ever test their full IT environment. In fact, even striving for 100% might be a mistake. It’s a goal few organizations will be able to achieve. They don't have the ability to take critical systems -- like an active directory -- offline. It would cripple operations. Cleanrooms, which are secure, separate spaces designed to be isolated from infected software or hardware, are helping organizations. The technology provides a safe testing area in the cloud. That makes it possible to run recoveries without interrupting daily operations.
Ultimately, being able to quickly and efficiently get even a small section of the IT environment back online can often suffice. It indicates the company’s strategy is sound and, ideally, will be scalable in the event of a more drastic incident.
Shoot for Failure
It may sound counter-intuitive, but companies actually want to fail these tests -- at least, some of the time.
No business can expect constant perfection. Failures highlight the gaps. And ultimately, the more of these gaps the company identifies and fixes, the more resilient the organization will be in the end.
Increasingly, the failures may not even be a result of any underlying problems. Instead, it’s a result of companies running into complications as they try to automate more of the process. Many security functions still rely on manual tasks by overworked specialists. The eventual goal, though, is to be able to run a full restore with just the click of a button. That requires training. Each failure helps an organization get closer to a fully automated process.
At the end of the day, it doesn’t matter if the business is testing 5% of their environment or 50%. The key is just to start testing -- and to establish a reputable and scalable process to continually loop in a wider swath of technologies.
Read more about:
Business Continuity/Disaster RecoveryAbout the Author
You May Also Like