SRE, for site reliability engineer or site reliability engineering, is a relatively new position that combines software engineering with IT systems management. In fact, it's so new that in the 2019 SRE Report from monitoring vendor Catchpoint, 64% of SREs surveyed said that their companies had been employing SREs for three years or less.
If you're not exactly sure what an SRE does, you're not alone. In a nutshell, you can think of an SRE as being a systems administrator on steroids. While a systems administrator might be responsible for deploying, monitoring and management dozens or hundreds of servers, SREs keep watch over thousands or tens of thousands of systems. It's their job to maintain the reliability customers expect while helping their organizations continue to scale.
The only really effective way to manage so many servers at once is to write software that does most of the work for you. So SREs spend a good bit of time writing scripts and using automation tools.
They also spend a lot of time on incident management. In the Catchpoint SRE survey, almost half (49%) said that they had worked on resolving an incident within the last week. When your favorite Web service goes down, an SRE is probably taking the blame and working to fix it -- and that can be a tremendously stressful position.
On the other hand, that stress comes with some definite rewards. According to Hired, SREs earn an average salary of $126,000 per year, and salaries can be even higher in cities with a lot of demand for SREs. The job board reported that the cities with the highest demand for SREs include the following:
So what qualifications do you need to land one of these lucrative positions? Most companies are looking for someone with a bachelor's degree in computer science or an equivalent level of expertise. They would love to have someone with previous SRE experience, but since the field is relatively new, it can be hard to find those people, particularly for junior-level positions. If you're currently working as a developer, software engineer, systems administrator, or DevOps engineer, you could probably land an SRE job if you first do some work to fill in any gaps on your current skills list.
What follows are nine steps for moving from your current IT role to a job as a site reliability engineer.
1. Start with Some Research
This slideshow can only scratch the surface of what site reliability engineering is all about, so you really need to do some additional research if you want to pursue this career path. Google wrote the book on site reliability engineering -- literally. The search giant was one of the first companies to employ SREs, and some of its best wrote a book called Site Reliability Engineering that you can read online for free.
Other resources that could help you learn more include the SRE Weekly newsletter and SREcon. YouTube has some good SRE presentations, such as The Keys to SRE, Site Reliability Engineering at Dropbox and Netflix: 190 Countries and 5 CORE SREs. And for the last couple of years, the annual Interop conference has also had sessions devoted to SRE.
2. Know Yourself
Once you think you have a handle on what being an SRE is all about, experts say that you should think long and hard about whether it's a good fit for you. Most IT professionals are strongest in one main area -- like writing code or project management or maintaining servers. SREs must be good at (and interested in) lots of things, including both coding and managing systems. It also helps to be the kind of person who really enjoys finding novel solutions to problems. SREs often tackle issues that no one has ever seen before, so you probably won't be able to find the answers with a simple Google search. But if you love a challenge, don't mind stress and can see opportunities hidden inside problems, site reliability engineering might be a good choice for you.
3. Develop Your Software Engineering Skills
SREs have to be able to code. If you're currently working as a developer or software engineer, you've probably got this aspect covered. If not, you might want to take some classes or even enroll in bootcamp to bring your programming skills up to par.
Which languages should you learn?
SREs should know a scripting language. Python is most common, but some companies are also looking for people who know Ruby or Bash. Some employers are also looking for experience in other programming languages, such as Java, C/C++, Go, or Perl. If you have an idea of which company you would most like to work for, check out their SRE job postings to see what they require. If you're not sure where you want to work, start by learning Python.
4. Expand Your Systems Knowledge
SREs also need to understand systems -- particularly Linux and cloud-based systems. Many SREs work for large Web-based companies or software as a service (SaaS) firms, so many of the systems that they manage are based in the cloud.
If you don't have a lot of experience in this area, you can supplement your knowledge with online courses. All of the major public cloud providers have extensive learning resources available. You can also check out sites like EdX, Coursera and Udemy for free or paid classes related to infrastructure, servers, cloud computing, Linux and more.
5. Gain Experience with DevOps and Automation Tools
Many people describe site reliability engineering as an extension of DevOps or "taking DevOps to the next level," and many organizations include knowledge or experience of DevOps in their requirements for SREs. The two do overlap in many ways, particularly in their emphasis on using automation tools to help management infrastructure.
If you have little experience with DevOps, you might want to do some research into the approach or even take a class. You might also want to gain familiarity with some of the most popular tools used by DevOps professionals and SREs, such as Docker, Kubernetes, Chef, Puppet, Jenkins and Ansible.
6. Investigate the Companies That Hire SREs
The list of companies hiring SREs reads like a list of who's who in the technology world. At the time of writing, a search for open SRE positions turned up jobs at Google, Intel, NetApp, HPE, Tesla, Apple, Workday, Microsoft, Adobe, Fitbit, Yelp, Slack, Twitter and Electronic Arts, among many others. There are also some non-tech companies, like Nordstrom, MasterCard, Capital One, Bloomberg and The Walt Disney Co., that are also looking for SREs.
Because the SRE position is fairly new, different companies have different expectations of people with this job title. Google has a specific (and very influential) perspective on the appropriate philosophy and responsibilities for SREs, but not all companies share that approach. Before you apply for an SRE position, get as much information as you can about what SREs do at the particular company you're considering.
7. Network with Other SREs
How can you get information about what SREs do at different companies? The best way is by talking to actual SREs. Look for meetups and conferences in your local area where you can get to know other site reliability engineers. These people can be an invaluable source of advice and can alert you to open positions that might be a good fit. You might also want to connect with other SREs on social media and read blogs related to site reliability engineering.
8. Prepare for Your Interview
Like many other tech-related positions, SRE interviews can be intense. In addition to the usual questions about your previous experience and job goals, you should expect that the interviewer will want to see some your capabilities in action. This might mean asking you to do some coding exercises or asking you how you would approach a particular troubleshooting problem. They might even give you a project to take home and return.
Most of the time, managers are not looking for a particular answer in these situations. Instead, they want to get a feel for how you think and how you approach issues (as well as double-checking that you really do have the skills you are claiming on your resume.)
You can't plan your answers in advance for these kinds of questions because you don't know what they might ask. However, doing a practice interview beforehand might help you feel more confident and less flustered during the real interview. You can also prepare by putting together a list of questions that you have about the job.
9. Keep on Learning
After you've landed your dream job as an SRE, your training has really only just begun. This field is changing incredibly rapidly, so you'll need to continue learning if you want to keep up. Plan to take classes and attend conferences as well as reading articles and watching videos related to the field. Experienced SREs say constant training is a way of life if you want to stay in this role.
Cynthia Harvey is a freelance writer and editor based in the Detroit area. She has been covering the technology industry for more than fifteen years. View Full Bio