Summary

Join SigFig's Infrastructure & DevOps team as a Manager SRE and lead a hands-on technical team supporting mission-critical systems. This role focuses on proactively improving system resilience, driving automation, and enhancing incident response. You will collaborate with engineering, SRE, and security teams to streamline deployment processes and ensure high-availability of services. The position requires managing and scaling infrastructure using various tools and acting as the first technical escalation point for production incidents. You will lead post-incident reviews and contribute to a growing incident knowledge base. SigFig offers competitive benefits including flexible PTO, wellness benefits, and a remote-first work environment.

Requirements

7+ years of experience in SRE, DevOps, or Technical Operations roles
2+ years in a leadership role managing global, distributed teams in a high-uptime environment
Proven experience with AWS, GCP, or Azure, and implementing infrastructure as code at scale
Strong scripting skills in Python, Bash or similar for automation and operational tooling
Deep understanding of observability and incident management best practices
Experience with CI/CD and deployment orchestration tools
Familiarity with containerized and microservices-based architectures
Passion for automation, reliability engineering, and continuous improvement
Excellent communication and leadership skills to coordinate across global teams

Responsibilities

Lead a global, distributed SRE/DevOps team operating in a 24/7 production environment
Develop and implement automation frameworks for self-healing, auto-remediation, and system optimization
Enhance monitoring and observability through tools like Splunk, Prometheus, and AI-powered alerting platforms
Improve CI/CD pipelines using Jenkins, GitHub Actions, ArgoCD, and drive continuous delivery at scale
Manage and scale infrastructure using Terraform, Kubernetes, Puppet, and similar tools
Act as the first technical escalation point for Level-2/L-3 troubleshooting of production incidents involving Linux servers, cloud networking, and Kubernetes clusters
Lead post-incident reviews, implement automated solutions for root cause issues, and contribute to a growing incident knowledge base
Collaborate cross-functionally with Engineering, Security, and Product to align reliability initiatives with business objectives
Establish and enforce SLOs and error budgets to continually raise system reliability standards

Preferred Qualifications

Previous experience in fintech or highly regulated environments is a plus

Benefits

Flexible PTO
Wellness benefit
Mobile/Internet subsidy
Employee Recognition Programs
Tax-friendly Compensation
Liberal Leave Policy
Medical cover for the family, including parents
Quarterly Wellness Benefit
WFH Allowance
Mobile/Internet subsidy (for smooth WFH experience)
Employee Referral Program
Employee Recognition Program

Manager - SRE

SigFig

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Manager

Share this job:

Similar Remote Jobs

Remote

Data

Principal

Remote

Business

Manager

Remote

Software Development

Senior

Remote

Sales

Entry Level

Our Future Health UK

Remote

Data

Mid-level

Remote

Project Management

Director

AbbVie

Remote

Sales

Mid-level

KOSTAL Group

Remote

Human Resources

Mid-level

AbbVie

Remote

Sales

Mid-level

Fenergo

Remote

Customer Service

Mid-level