Manager of Engineering, SRE at Platform Science

Summary

Join Platform Science as a Site Reliability Engineering (SRE) Manager and lead a high-performing team ensuring system reliability, scalability, and efficiency. You will coach the team, promote best practices, and enable development teams to deliver production-ready applications. This role involves overseeing multiple projects and initiatives while maintaining clear communication. The ideal candidate possesses 5+ years of software engineering or SRE experience, including 2+ years in a leadership position, and proven expertise with various technologies. Platform Science offers a comprehensive benefits package including medical, dental, vision, disability, life insurance, 401k, paid time off, and parental leave. The estimated base salary is between $134,550 and $200,000.

Requirements

5+ years of experience in software engineering or SRE roles
2+ years in a leadership or management position
Proven expertise with Kubernetes, ArgoCD, AWS, Prometheus, Grafana, Datadog, FluentD, Jenkins, and Docker
Strong knowledge of CI/CD and GitOps practices
Excellent verbal and written communication skills
Demonstrated ability to track and prioritize multiple projects, requests, and initiatives effectively
Bachelor’s degree in Computer Science, Engineering, or equivalent experience

Responsibilities

Recruit, train, and mentor a team of Site Reliability Engineers to deliver operational excellence
Foster a culture of innovation, collaboration, and adherence to SRE principles like SLOs, error budgets, and production readiness
Standardize and train development teams on observability tools such as Prometheus, Grafana, and Datadog
Enhance developer and release workflows using CI/CD best practices, GitOps methodologies, and tools like Jenkins, ArgoCD, and Docker
Drive application and system resilience through chaos engineering, load testing, and automation
Collaborate with teams to define SLIs, SLOs, and manage error budgets
Manage on-call rotation schedules, optimize alerting processes, and ensure 24/7 production application support
Serve as the escalation point for incident resolution, providing guidance and technical expertise
Build tools, dashboards, and processes to improve incident response, production health, and system reliability
Conduct quarterly "State of the Service" reviews to assess performance, sustainability, and risks
Track and prioritize multiple initiatives while ensuring the team stays focused and aligned with organizational goals
Maintain detailed documentation on team projects, requests, policies, and best practices
Communicate effectively across teams, departments, and stakeholders to ensure alignment and a clear understanding of SRE initiatives
Evangelize SRE practices across the organization and ensure consistent adoption of reliability-focused processes

Benefits

Medical, dental, and vision insurance
Short-term and long-term disability insurances
AD&D and life insurance
401k plan
Paid vacation, sick leave and holidays
Six weeks of paid parental leave

Manager of Engineering, SRE

Platform Science

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Manager

Similar Remote Jobs

Remote

DevOps

Manager

Centric Software

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Manager

Remote

DevOps

Manager