Are you a site reliability engineer, or are you interested in becoming one? There are hundreds of open SRE jobs ready to fill in Austin at any given time. It’s also considered a high-income career, with compensation often clocking in at over $140,000. So what does the job entail, and what do we look for when recruiting top site reliability engineer talent?
What is a Site Reliability Engineer?
“A site reliability engineer creates a bridge between development and IT operations by taking on the tasks typically done by operations,” Flagship by AB Tasty explains in its software release glossary. “Instead, such tasks are given to these types of engineers who use automation tools to solve problems by creating scalable and reliable software systems.”
Austin startup BenchSci explains its senior SRE position as someone who:
- Applies their technical and domain expertise to solve complex technical and business challenges
- Participate in design discussions, code reviews, and project-related team meetings
- And works with other engineers to develop innovative solutions that meet business needs concerning functionality, performance, observability, scalability, and reliability
The role was first defined by Google on the basic tenet that doing operations well is a software problem and should therefore use software engineering approaches “across a wide field of view, encompassing everything from process and business change to similarly complicated but more traditional software problems, such as rewriting a stack to eliminate single points of failure in business logic.”
While site reliability engineering and DevOps are separate trending disciplines, Google asserts they overlap and fundamentally work together. From Google’s viewpoint, DevOps would traditionally “write code with little understanding of how it would run in production…[throwing] this code over the proverbial wall to the operations team, which would be responsible for keeping the applications up and running.” A site reliability engineer can take over from there by implementing what DevOps throws over to operations.
What Makes a Good SRE?
Glassdoor cites the top companies hiring site reliability engineer talent right now are Cox Automotive, Indeed, Oracle, Addison Group, Imagine Learning, Apple, Technology Navigators, Hypori, Michael Page, and CyberCoders. The HT Group works directly with other top employers to privately recruit site reliability engineering talent.
“These companies are looking for a special balance of hard and soft skills,” says Paul McGaughan, Practice Director of The HT Group Technical Recruiting. “You’re a liaison between developers and business leaders, so you need to be both personable and technically knowledgeable. You also need to inherently understand and relate to the challenges and motivations of both the technical and operational sides.”
Required hard skills could include a minimum BS or MS in Computer Science or a related area experience in developing software, hardware and software configuration, networking, coding, managing cloud infrastructure/web services, and other skills specific to the employer’s needs. Specific industry experience and niche experience within those industries may be essential. For instance, Tesla looks for varying backgrounds in their site reliability engineers depending on which area they’ll be in: diagnostics, factory software, and vision automation are a few examples.
As The Enterprisers Project points out, site reliability engineer hiring has doubled year-over-year because organizations are seeking to remove performance bottlenecks, maintain service level objectives, fix complicated problems in real-time, and break down the siloed processes of DevOps. Since it’s a relatively new position, specific experience as a site reliability engineer may not be required.
“[Employers] would love to have someone with previous SRE experience, but since the field is relatively new, it can be hard to find those people, particularly for junior-level positions. If you’re currently working as a developer, software engineer, systems administrator, or DevOps engineer, you could probably land an SRE job if you first do some work to fill in any gaps on your current skills list,” writes Cynthia Harvey for InformationWeek. To get an even clearer picture of what a site reliability engineer does, check out these helpful “day in the life” articles from TechTarget and DevOps.com.