Proverbs 27:17 says: “As iron sharpens iron, so one person sharpens another.” Countless professional sports teams have cited this line for team building and training. Less often though, do you hear this discussed in the context of running an engineering team in a corporate environment.
Rather than putting your engineers through cookie-cutter exercises that they’ve likely completed previously in job interviews or school, you should be seeking ways to teach them how to deal with intense, yet unanticipated pressure. How else can you prepare your team to be successful without training in high-stress environments, just like a professional athlete would? Even the biggest sports stars get injured and bruised in team scrimmages, and engineering drills are one of the best ways to see how your team functions in a competitive-like atmosphere while exposing areas for improvement.
I’m going to share some insights into some of our favorite engineering drills we run our team through at Okta. I’ll also touch on the ways we treat these drills as specific standards and the benefits of how they help your employees mature.
Before we allow anyone on our team to perform maintenance in production, we put them through a rigorous “War Games” simulation. We ask the engineer-in-training to work on a task, while another more-experienced team member simultaneously tries to cause disruptions that tests the new engineer’s ability to respond to unexpected incidents under simulated pressure. We even have someone impersonating me (although sometimes I like to get my hands dirty too), requesting additional information or asking about customer impact, expected time to resolution, and so on.
Maddening no doubt, but all of these distractions are done with the intention of raising the temperature and creating a scenario that is as close to a real-world situation as possible. If executed properly, ad hoc tests will require your engineer to run through their competencies in identification, troubleshooting, quick mitigation, and communication. Since they won’t have the luxury of having established procedures to operate under, ensuring they have a solid understanding of how to approach any type of problem will go a long way towards ensuring their success. You want everyone on your team to be ably prepared to handle real-life crises or situations without having a personal meltdown, which means that everybody should train, prepare and ultimately earn their black belt!
It’s important to differentiate our “War Games” simulation from the traditional “disaster recovery drills” (which we also do). Traditional disaster recovery drills are focused on simulating something that happens in production to test an individual’s knowledge of responding with established procedures or “run books.” The “war games” we conduct are valuable because they force engineers under pressure to think creatively and create ad hoc solutions without the safety net of a manual on how to handle standard situations. At Okta, once engineers have passed the “War Games” simulation with flying colors, they are considered to have earned their black belts and ready to face the world.
Cops and robbers
As an innovative engineering team, your architecture will continue to evolve. So, as you introduce changes or new components, you also need to make sure your team’s knowledge about it evolves as well. Keeping the team’s skills sharp is of utmost importance, and that includes being able to handle situations that involve new failure modes, new components or new pieces of infrastructure that they had not seen before.
To do that at Okta, we run “Cops and Robbers” simulations. These simulations are more structured than “War Games” and are designed for training rather than testing. During “Cops and Robbers” the team is given an overview of the technology being introduced and what to expect from it. Then, the team is dividend into “Cops” and “Robbers.”
The “Cops” will be tasked with understanding how to monitor several metrics of the new technology to ensure it is working correctly and to take manual action if needed to restore service when a disruption is identified. On the other hand, the “Robbers” will purposely introduce several disruption modes and monitor how the disruptions are affecting the system. During the simulations, the “Cops” can learn how to monitor the technology and identify “good” and “bad” patterns that will be turned into automated alerts once the technology is in production.
It’s also important to note that these exercises have an added benefit for the infrastructure architects in our organization; during these drills, if our architects find that the technology being introduced doesn’t fail-over adequately or requires “manual” intervention from the “Cops” to mitigate the disruption, the technology will go back to the lab and won’t be enabled in production until it’s ready.
Engineering drills aren’t just for the rookies
While passing the “War Games” is the standard that we set for our new engineers hoping to work in production and “Cops and Robbers” is what we do to keep their skills sharp, you shouldn’t limit simulations to new employees or new technology. “Cops and Robbers” and periodic revisions of “War Games” are critical for keeping even the most veteran of engineers sharp and training your team in new versions of existing technology. You can always try to mastermind new testing situations that push your engineers to the edge of their capabilities. In fact, it’s even okay if you push them over the edge to fail, forcing them to reassess themselves.
When considering various drills to put your team through, ask these basic questions: What does the scenario look like? Who’s in charge? How long will it take? How will the architecture react to failure? How do you measure and monitor the results? What are the next steps once the drill is over?
The benefits that you’ll see from running these kinds of drills will not go unnoticed. Your team will be always on and ready to respond, an axiom our team at Okta lives by, constantly thinking of new ways to not simply pass these tests but most importantly, improve their day-to-day work and preparedness.
Hector Aguilar, Executive Vice President of Engineering and Chief Technology Officer at Okta.