What Special Ops Taught Me About TechOps - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
DevOps
Commentary
11/12/2018
02:00 PM
Lucas Villeneuve, Systems Engineer at Constant Contact
Lucas Villeneuve, Systems Engineer at Constant Contact
Commentary
50%
50%

What Special Ops Taught Me About TechOps

A look at how a day of special forces-style training delivered benefits in the technology realm.

Back in the summer of 2015, I found myself hauling a massive log with 16 other people through the narrow streets of Boston, nine hours into an endurance event  -- and it was one of the most profound experiences shaping my approach to TechOps and IT incident management.

The log-lifting was part of my crazy notion to participate in a special forces training event put on by GORUCK, an apparel and gear company founded by a former Army Green Beret. The rules were two-fold and simple: we couldn’t put the log down for a single second, and we had to navigate it safely through the city without acquiring so much as a scratch on it.

Though this might seem like a rather left-field activity for a systems engineer to take on, it actually turned out to be the perfect hands-on lesson in the importance of communication to solve shared goals -- a perspective I have embraced while designing and implementing key processes of incident management at Constant Contact.

You see, within incident management teams, we have a saying: “Ops over outcomes.” It sounds counterintuitive at first. After all, we measure performance based on business impact, right? However, I’ve witnessed first-hand that if you have the right operations in place, outcomes often take care of themselves. The key ingredient is communication, so that you can optimize how effectively and efficiently your teams are able to collaborate in real time.

During Operation Log Haul, we had to juggle roles and responsibilities across the team in order to complete our mission. We appointed a team leader and assistant team leader and then coordinated the process of relieving and switching out team members to maximize everyone’s endurance. What became very clear by the end was that it wasn’t our physical prowess that led to our success, but our emphasis on communicating and collaborating as a single, cohesive unit. We had a shared challenge at hand, and it took everyone working together to overcome it.

Solving the shared challenges of incident management

At Constant Contact, we deal with a very high volume of data and activity. We pride ourselves on working toward delivering unparalleled service to our customers, which of course means minimizing downtime and responding to any issue as quickly as possible.

Over the last few years, we’ve managed to successfully identify and rebuild specific processes and areas in need of improvement, with the goal of enhancing our overall collaboration proficiency. For example, we knew that we wanted to enhance our escalation procedures through smarter notification delivery and targeting, and really pinpoint the correct people to engage rather than sending mass alerts. In addition, we wanted to put measures and structures in place to ensure individual accountability in driving issues to full resolution.

Another challenge we faced was to put protocols in place for recurring or familiar-looking issues. Following an incident, we would simply trust that the responsible teams involved in the fix would “keep an eye out” if a similar issue arose once more, and not let it happen again. While our Ops team was used to adapting to new processes and procedures, there were no definitive “on-call” schedules set in dev, which made it difficult to ensure everyone was on the same page.

We knew these challenges needed to be addressed in order to maintain -- and improve upon -- our legacy of excellent service to our customers. As our IT ecosystem only continues to get more complex and our customer needs evolve and become more demanding, it’s become increasingly important that we stay ahead of potential issues to avoid downtime whenever and wherever possible. We knew that tackling our communication processes would be instrumental in achieving this goal.

Delivering the promise of TechOps

Ultimately, we set out to unify the data sharing and handoff between all of our tools, as well as establish clear-cut escalation logic, automation of certain processes, and integrated incident communication. We introduced xMatters to act as the integration hub between all of our other solutions and tools, including Jira, Nagios, New Relic, BigPanda, and HipChat for ChatOps. We also implemented smarter escalation procedures, including Corrective And Preventive Actions (CAPA) to hold our teams accountable and targeted notifications to resources who are actually on-call instead of mass alerts.

Our incidents have been trending downward at Constant Contact, our unplanned downtime is minimal, and we're now responding to incidents 10 times faster than before. Today, our customers are living proof of the “ops over outcomes” mantra – they are still sending many emails a day, and no doubt benefit from our ability to keep the service as “always on, always available” as possible.

My advice for organizations looking to transform their approach to incident management is to sit down and clearly outline and identify your business’s real communication needs so that you can build the desired processes and procedures. When paired with the right tools, this is the absolute best way to ensure your technologies and strategy will support your business needs.

Lucas Villeneuve is the Systems Engineer for Constant Contact, an Endurance International Group Company. Lucas is embedded within the systems team with a focus on MailOps. He has been responsible for helping the company increase command center visibility across its DevOps toolchain and unify its incident management process across multiple teams, resulting in a 10x faster incident response.

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
Data Science Salary Survey Reveals Market Shift
Jessica Davis, Senior Editor, Enterprise Apps,  6/27/2019
Commentary
A Practical Guide to DevOps: It's Not that Scary
Cathleen Gagne, Managing Editor, InformationWeek,  7/5/2019
Slideshows
How to Land a Job in Cloud Computing
Cynthia Harvey, Freelance Journalist, InformationWeek,  6/19/2019
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
A New World of IT Management in 2019
This IT Trend Report highlights how several years of developments in technology and business strategies have led to a subsequent wave of changes in the role of an IT organization, how CIOs and other IT leaders approach management, in addition to the jobs of many IT professionals up and down the org chart.
Slideshows
Flash Poll