Tech Lead, Site Reliability Engineer

DC SCORES Logo

DC SCORES

๐Ÿ“Remote

Summary

Join Ditto, a fast-growing startup redefining data movement at the edge, as a Lead Site Reliability Engineer. You will lead and mentor a regional SRE squad, setting the standard for enterprise-grade reliability. Responsibilities include incident management, architecting observability solutions, implementing SLIs/SLOs/SLAs, and establishing best practices. This role requires significant experience in SRE or DevOps, leading technical teams, and working with modern monitoring stacks and cloud infrastructure. Ditto offers competitive salaries, equity, and benefits varying by region, including health insurance, retirement plans, flexible time off, and remote work options.

Requirements

  • 7+ years of experience in Site Reliability Engineering or similar DevOps roles with a focus on system reliability and incident management
  • 3+ years of experience leading and mentoring technical teams
  • Strong experience with modern monitoring stacks including Prometheus, Grafana, and Datadog
  • Proficiency in at least one systems programming language, such as Go, Rust, C or Java
  • Experience with Infrastructure as Code tools, like Terraform and Helm
  • Hands-on experience architecting applications for Kubernetes, and managing Kubernetes infrastructure
  • Experience with AWS and at least one other major cloud service provider (GCP, Azure)
  • Excellent communication skills, youโ€™ll set the standard for clear and succinct communication in incidents, hand-offs and project updates
  • Experience maintaining on-call rotations and incident response procedures
  • A high degree of agency, taking ownership of problems and identifying initiatives and improvements
  • Proven project management skills and the ability to balance competing priorities and interrupts
  • Understanding of security best practices in cloud environments

Responsibilities

  • Line manage your regional squad of SREs, providing leadership and setting the standard for enterprise ready reliability
  • Develop a high-performing team through mentoring, coaching, and creating growth opportunities for engineers
  • Engage with incident management and escalations, ensuring your squad sees continual improvement in incident response and actively owns follow ups
  • Architect enterprise-grade observability solutions across complex distributed systems
  • Actively lead and manage SREs initiatives, co-ordinating across teams where needed
  • Guide the implementation of SLIs, SLO and SLAs that align with business objectives
  • Establish best practices for documentation, runbooks, and knowledge sharing across engineering
  • Play an active roll in on-call, and manage your squadโ€™s rotation

Preferred Qualifications

  • Experience directly line managing SREs
  • Experience building or operating multi-tenant, multi-cloud SaaS/DBaaS Platforms
  • Familiarity with edge computing or mesh networking
  • Experience instrumenting advanced observability practices (tracing, profiling) in distributed systems
  • Experience working with globally distributed teams across EMEA and APAC regions

Benefits

  • Health, dental, vision, life, and disability insurance
  • 401(k) and flexible spending accounts
  • Private healthcare through Vitality
  • A pension plan
  • Flexible time off

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.