Summary
Join Gusto as a Sr. Software Engineer in Reliability Engineering and contribute to building and maintaining reliable systems. You will design and implement dashboards, alerting systems, and developer tools, while leading the adoption of DevOps practices. Your responsibilities include automating reliability and observability, mentoring engineers, and establishing engineering standards. The ideal candidate possesses strong problem-solving and communication skills, along with 5+ years of software engineering experience. Gusto offers competitive compensation and benefits, including health insurance, 401(k), and flexible work arrangements.
Requirements
- Strategic thinker, driven to identify high impact opportunities and efficiently implement systemic solutions
- Resilient problem solver, inspired to be in service of our peers and Gustoβs customers
- Strong communicator, committed to drive alignment across technical and non-technical stakeholders
- 5+ years of professional experience as a software engineer
- Implementation and integration of observability platforms. (Datadog preferred)
- Experience with incident remediation and development of incident management programs
Responsibilities
- Build Tooling & Infrastructure: Design and implement reliability dashboards, AI-driven alerting systems, and internal developer tools that promote operational excellence and self-service
- Drive Strategic Initiatives: Lead the adoption of DevOps practices across product engineering teams, including environment standardization, service readiness, and release reliability
- Automate Reliability & Observability: Develop intelligent systems for automated alerting, diagnostics, and incident response using AI/ML approaches. Enhance observability through centralized dashboards and proactive monitoring strategies
- Mentor & Influence: Coach engineers and leaders on DevOps best practices, champion reliability-focused principles, and mentor peers in systems thinking and operational maturity
- Establish Standards & Automation: Define engineering standards and implement deterministic automation with a focus on usability, accessibility, and long-term system resilience
Preferred Qualifications
- Experience with Ruby, Python, and TypeScript
- Deployment and operation of cloud infrastructure. (AWS preferred)
- Provisioning and managing infrastructure using Infrastructure-as-Code tools. (Terraform preferred)
- Deploying and operating container orchestration. (Kubernetes preferred)
- Proficient in Linux system administration and comfortable working in shell environments
- Designed and supported high-availability architectures and scalability strategies
- Participated in service extraction efforts to break apart monoliths and transition toward a service-oriented architecture
Benefits
- Health insurance
- 401(k)s
- Flexible work arrangements
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.