Site Reliability Engineer

Cutover
Summary
Join Cutover's UK team as a Junior Site Reliability Engineer (SRE) and contribute to the reliability and performance of our production systems and applications. This role is ideal for early-career professionals or those transitioning into SRE, involving incident response, collaboration with cross-functional teams, root cause analysis, and enhancing observability tools. You will also contribute to automation, documentation, and development of internal tools. The position requires familiarity with scripting languages, containerization or IaC, observability tools, and core networking concepts. The role offers a competitive compensation package, including stock options, 25 days of PTO, private health insurance, a pension scheme, and a personal learning and development budget. The work location is flexible, allowing for fully remote work within the UK or a hybrid model with up to two days in the London office.
Requirements
- A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems
- Familiarity with at least one scripting language such as Ruby, Java, Python, Bash
- Exposure to containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
- An eagerness to follow modern engineering practices and learn from others
- Introductory experience or coursework involving observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
- Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
- A collaborative mindset with clear communication skills
- Willing to ask questions to gain a better understanding of new or complex concepts
Responsibilities
- Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
- Collaboration: Support cross-functional teams during investigations and post-incident reviews
- Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
- Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
- Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
- Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
- Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
Preferred Qualifications
- Exposure to major incident response processes
- AWS Certified Cloud Practitioner or hands-on experience with cloud environments
Benefits
- Share Options as part of our compensation package
- 25 days of PTO per year + public holidays , and we want you to take all of them!
- 3 volunteer days to use for any charitable/voluntary cause you would like
- A top-tier private health insurance package
- Aviva pension scheme
- Work from home stipend
- A personal learning and development budget through Learnerbly. Youโll be supported in your quest for knowledge, whatever that looks like to you
- If youโre thinking of starting or growing your family, then youโll be in great company - more than half of our team are parents and weโve built a globally consistent parental leave approach that weโre proud of
- Employee Referral Scheme
- Safeguarding the mental health of our teams is paramount for us. If youโd like to, then youโll be able to avail yourself of multiple Cutover mental health initiatives , from fully subsidised therapy sessions to subscriptions to leading wellbeing platforms