Site Reliability Engineer

Octopus Deploy
Summary
Join Octopus Deploy's Build Platform team as a Senior Site Reliability Engineer (SRE) and significantly impact the developer experience. This remote-first role, based in Australia or New Zealand, focuses on maintaining high reliability of build systems, improving existing practices, and implementing new capabilities. You will collaborate with other teams, share expertise, and contribute to a culture of automation and quality. The ideal candidate excels in availability, reliability, and observability, possesses strong systems engineering skills, and embraces a 'you built it, you run it' philosophy. Compensation is competitive and includes benefits such as generous paid time off, parental leave, and stock options. Octopus Deploy is a remote-first company with a transparent and supportive work environment.
Requirements
- Have full working rights and residency in Australia or New Zealand
- The way of working outlined here ( https://github.com/OctopusDeploy/People/blob/main/Site-Reliability-Engineering/L3-Senior-Site-Reliability-Engineer.md ) is your natural way of getting things done
- Excel in an environment focused on availability, reliability, and observability
- Be skilled in systems engineering and may have specialized expertise in specific areas
- Find value in applying safety culture lessons from other industries to your work
- Be adept at leading postmortems and designing deployment and monitoring pipelines
- Have a passion for automating builds, tests, deployments, infrastructure, and operational tasks
- Embrace a "you built it, you run it" culture, with a commitment to quality and system availability, participating in a humane on-call program
- Be self-motivated, work independently with high-quality output, and seek help or new tasks when needed
- Collaborate effectively to solve problems, combining passion, pragmatism, and empathy
- Be results-oriented, adaptive to business direction changes, and encourage the same approach in others
- Thrive on candid feedback, solving complex problems, and helping fellow engineers succeed while working on valuable projects
Responsibilities
- Use their SRE skills to keep the Builds systems running with high reliability
- Help improve and iterate our existing reliability practices
- Bring new ideas/practices to increase reliability and reduce toil
- Spearhead implementation of new capabilities
- Share SRE expertise with other teams in the company
- Working on building new capabilities to increase reliability (we donβt want you staring at monitoring dashboards all day)
- Handling a request from an internal team, helping solve a challenging build, test or packaging issue, or offering advice to an engineer to help them fall into the pit of success
- Pairing with another engineer on a Zoom call to solve a complex technical problem or explore and define the problem space for future innovation
- Responding to an actionable alert and working to maintain the reliability of the platform used across the company
- Improving our documentation to help engineers discover solutions for themselves and reduce lead time
- Writing a blog post about something interesting for other engineers or preparing a presentation on what was learned from a recent incident
- Facilitating an incident review or preparing a presentation on what was learned
- Proactively reducing future toil by building automation
Preferred Qualifications
Any experience with C# Application SDLC is highly regarded (e.g. building, testing)
Benefits
- Minimum of 25 days annual leave
- Up to 10 days of paid sick and carers leave
- 12 weeks of fully paid parental leave with flexible return options
- Stock options
- Remote work