Summary
Join Gusto's Infrastructure Engineering team as a Sr. Site Reliability Engineer and build secure, resilient, and accessible systems using AWS, terraform, and Kubernetes. Design and implement production-grade systems, establish automation standards, and plan complex migrations to modern designs. Continuously improve the on-call experience and solve complex problems. The ideal candidate possesses 8+ years of Software Engineering/SRE experience, strong coding skills (Python, Java, Typescript, or Ruby), and a systems thinking approach. Gusto offers competitive compensation ($164,000-$235,000 depending on location), and employees in Denver, San Francisco, and New York will work from the office 2-3 days a week.
Requirements
- 8+ years of Software Engineering / SRE experience
- Strong ability to write code in Python, Java, Typescript or Ruby
- Strong work ethic and self starter who finds a way to operate effectively amidst ambiguity
- Systems Thinking approach to complex systems who seeks to identify small changes that can create big impact
- Resilient problem solver capable of doing whatever it takes to get work done in service of our peers as well as the entire Gusto customer base
- Effective communicator that can simplify complex topics and communicate persuasive ideas
Responsibilities
- Design and implement production grade systems that optimize for resiliency while limiting complexity
- Establish standards and build deterministic automation while optimizing for user accessibility and system reliability
- Plan and execute complex and challenging migrations that help Gusto move forward from legacy systems to more efficient, scalable, modern designs
- Continually improve the on call experience by proposing and implementing long term oriented solutions while optimizing for the sustainability of on call
Preferred Qualifications
- Experience designing elegant but simple systems that optimize for resiliency, but can be easily maintained and operated upon
- Strong experience with Docker and Kubernetes
- Experience with cloud platforms like AWS (preferred), GCP, or Azure and correlated IaaS management tools like Terraform (preferred)
- Experience leading incident remediation for complex systems
Benefits
- Health insurance
- 401(k)s
- Remote work, flexible hours
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.