Lead SRE Engineer

Logo of Tide

Tide

๐Ÿ“Remote - United Kingdom

Job highlights

Summary

Join Tide's SRE team as a Lead Site Reliability Engineer and contribute to the reliability, scalability, and efficiency of our systems. You will coach and enable teams on reliability best practices, automate repetitive tasks, and promote ownership and collaboration. The ideal candidate possesses extensive experience in developing and maintaining resilient software across various stacks and is familiar with SRE practices and monitoring tools. Tide offers a flexible work environment, competitive benefits, and opportunities for professional development. We are looking for a highly experienced engineer with a background in distributed computing and experience with cloud platforms like AWS and Kubernetes. We encourage a culture of operational excellence and continuous improvement.

Requirements

  • Degree in Computer Science or a related field
  • Experience in distributed computing
  • A depth of knowledge with one or more of the following programming languages: Python, Java, Go, JavaScript
  • Experience with a variety of cloud computing platforms and technologies, including AWS, Kubernetes, Terraform, and EKS
  • Familiarity with observability tooling like Prometheus, ELK, Datadog, Honeycomb or others
  • Familiarity with designing fault-tolerant systems, including SLIs, SLOs and error budgets

Responsibilities

  • Coach and Enable Teams: Educate teams on ownership principles and reliability best practices to embed a culture of operational excellence
  • Eliminate Toil: Automate repetitive tasks, such as incident mitigation or database failover, to allow teams to focus on creative problem-solving
  • Promote Ownership and Collaboration: Embrace a "you build it, you run it" philosophy, working across teams to ensure system reliability and accountability
  • Drive observability approach. Make the best use of our tooling, automate and help teams to be on top of their service statuses
  • Set the stage for incident management. Help to streamline the incident management process, build tooling for it. In case of complex problems, help the teams to navigate it

Benefits

  • Flexible working options
  • Share options
  • Group Life Insurance
  • Vitality Health Insurance, with a proactive focus on mental and physical wellbeing
  • 25 days holiday with the ability to buy extra days
  • 3 days for L&D or volunteering time off per year
  • We invest in your development with a ยฃ1,000 professional L&D budget per year
  • Access to โ€˜salary sacrificeโ€™ benefits such as Cycle to Work scheme and pension contribution
  • Spacious brand-new office near Old Street with an all-day snacks bar
  • Enhanced family-friendly leave
  • Sabbatical leave

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.