Finding Balance in Dev vs. Ops for Site Reliability Engineers - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

DevOps // Project Management
07:30 AM
Connect Directly

Finding Balance in Dev vs. Ops for Site Reliability Engineers

Results from a recent survey show some organizations have pushed SREs in directions that underutilize and squander their talents.

The demands organizations put on site reliability engineers pushes them to devote more time to the operations side of their responsibilities rather than maintain an even balance. Catchpoint released its 2020 SRE Survey Report, which gathered responses from more than 600 site reliability engineers from around the world. The annual survey was conducted in two rounds, the first in February and second in May. Those results, along with perspectives from experts at Volterra, point to how the role of SREs is reshaping.

Though it has been posited that a 50-50 split between development and operations is ideal for SREs, the majority of the Catchpoint survey respondents indicated they spend 75% of their time on operations. That imbalance can affect job effectiveness with 53% of the respondents saying they were brought in “too late” during the application lifecycle. This may be a sign that organizations should rethink how they utilize SREs as the role continues to evolve.

What companies expect out of their site reliability engineers can vary based on management’s understanding and intentions for the role. “A lot of organizations have put the word SRE in ops titles because it’s more fashionable,” says Mehdi Daoudi, CEO of Catchpoint. In such cases, he says, the engineers might not perform traditional SRE duties, which may include engineering, automation, and monitoring. “One of the biggest challenges we see this year is people are not taking full advantage of what a true SRE team can bring to the table,” Daoudi says.

Image:  SolisImages -
Image: SolisImages -

When SREs have the bandwidth to fulfill their core duties, he says they can improve scalability, resiliency, monitoring, and maintaining overall functionality. Imbalances in SRE job responsibilities, Daoudi says, shown in the survey responses tend to come from organizations that still have legacy applications and infrastructure. “SREs are thrown into the fire to maintain things,” he says. Organizations with legacy technology that are also on a path to cloud, microservices, or containers tend to involve SRE teams in end-to-end platforms, Daoudi says.

Changes in the duties of SREs has been accelerated by migration to distributed cloud, says Jakub Pavlik, Volterra’s director of engineering. “Before, people just had datacenters that were all centralized.” The rise of hybrid cloud and DevOps made organizations want to move quickly and automate application deployment, he says.

The effects of COVID-19 further pushed the move to distributed cloud, which spurred the need to set up multiple locations, providers and edge computing, Pavlik says. That can put more pressure on SREs to focus on the operations side of their duties. “They don’t have as much time for some development activities because they are overburdened on making sure all the systems are running,” he says.

Successful implementations of SRE teams at disruptors such as Netflix and Google naturally have not always been matched by other enterprises, Pavlik says. Some companies simply renamed their operations team to SRE team, but he believes any current confusion will be simplified over time. Pavlik says Volterra partially runs different workloads on different cloud providers and sees challenges of standardization of monitoring and observability. That makes finding staff to fill SRE roles vital though a challenge in the current market. “Getting SRE people is not easy,” he says. “Even if you have unlimited budget, you will have a hard time getting so many talented people. It needs to be solved by right-tooling and automation.”

Catchpoint works largely with SRE organizations and Daoudi says the companies that are most successful tend to take on new projects, designs, or initiatives in bite-size portions rather than tackle everything all at once. Still some organizations try to make moves in a hurry with monolithic systems that he says are not well-suited for such approaches.

Adapting SRE principles to the organization is essential, Daoudi says, rather than strictly following examples set by other enterprises. “Rewrite the [Google SRE] guidelines for your organization and system,” he says. “This SRE transition reminds me of agile 20 years ago, where you don’t just go overnight. There are baby steps that people need to adopt.”

Taking into account the nuances of what SREs can do rather than lumping them into operations may be a way for enterprises to better utilize their skills. Daoudi says some organizations specialize their SRE teams in areas such as CDN traffic, traffic engineering, and multicloud infrastructure. SRE organizations can also be a conduit for bringing observability to life, he says, which can drive an organization to achieve their objectives. “I think you’re going to see a lot of things made specialized when it comes to machine learning and being able to write algorithms to go through the vast amount of telemetry being collected.”

For more on site reliability engineering, follow up with these stories:

Study: Cloud Migration Gaining Momentum

Site Reliability Engineers: Living Under High Pressure

IT Careers: How to Get a Job as a Site Reliability Engineer

Joao-Pierre S. Ruth has spent his career immersed in business and technology journalism first covering local industries in New Jersey, later as the New York editor for Xconomy delving into the city's tech startup community, and then as a freelancer for such outlets as ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll