Tech Lead, Site Reliability Engineer at DC SCORES -

Summary

Join Ditto, a fast-growing startup redefining data movement at the edge, as a Lead Site Reliability Engineer. You will lead and mentor a regional SRE squad, setting the standard for enterprise-grade reliability. Responsibilities include incident management, architecting observability solutions, implementing SLIs/SLOs/SLAs, and establishing best practices. This role requires significant experience in SRE or DevOps, leading technical teams, and working with modern monitoring stacks and cloud infrastructure. Ditto offers competitive salaries, equity, and benefits varying by region, including health insurance, retirement plans, flexible time off, and remote work options.

Requirements

7+ years of experience in Site Reliability Engineering or similar DevOps roles with a focus on system reliability and incident management
3+ years of experience leading and mentoring technical teams
Strong experience with modern monitoring stacks including Prometheus, Grafana, and Datadog
Proficiency in at least one systems programming language, such as Go, Rust, C or Java
Experience with Infrastructure as Code tools, like Terraform and Helm
Hands-on experience architecting applications for Kubernetes, and managing Kubernetes infrastructure
Experience with AWS and at least one other major cloud service provider (GCP, Azure)
Excellent communication skills, you’ll set the standard for clear and succinct communication in incidents, hand-offs and project updates
Experience maintaining on-call rotations and incident response procedures
A high degree of agency, taking ownership of problems and identifying initiatives and improvements
Proven project management skills and the ability to balance competing priorities and interrupts
Understanding of security best practices in cloud environments

Responsibilities

Line manage your regional squad of SREs, providing leadership and setting the standard for enterprise ready reliability
Develop a high-performing team through mentoring, coaching, and creating growth opportunities for engineers
Engage with incident management and escalations, ensuring your squad sees continual improvement in incident response and actively owns follow ups
Architect enterprise-grade observability solutions across complex distributed systems
Actively lead and manage SREs initiatives, co-ordinating across teams where needed
Guide the implementation of SLIs, SLO and SLAs that align with business objectives
Establish best practices for documentation, runbooks, and knowledge sharing across engineering
Play an active roll in on-call, and manage your squad’s rotation

Preferred Qualifications

Experience directly line managing SREs
Experience building or operating multi-tenant, multi-cloud SaaS/DBaaS Platforms
Familiarity with edge computing or mesh networking
Experience instrumenting advanced observability practices (tracing, profiling) in distributed systems
Experience working with globally distributed teams across EMEA and APAC regions

Benefits

Health, dental, vision, life, and disability insurance
401(k) and flexible spending accounts
Private healthcare through Vitality
A pension plan
Flexible time off

Tech Lead, Site Reliability Engineer

DC SCORES

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

DC SCORES

Remote

DevOps

Senior

Remote

DevOps

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Manager

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior