Site Reliability Engineer

August Health
Summary
Join August Health as a Site Reliability Engineer and help scale and strengthen the foundation of our infrastructure while supporting a product that genuinely impacts people's lives. You will join a small team of engineers, working to bring stability, observability, and performance to our systems as we grow. In this role, youโll help improve our operational capabilities, optimize performance bottlenecks, and work closely with support and engineering teams to address issues. We are looking for someone who enjoys problem solving and infrastructure craftsmanship and who brings both technical insight and strong communication skills. We offer market competitive compensation based on experience and ability, including significant equity option grants, excellent health, dental, and vision coverage, 401K, and unlimited vacation days. The whole team works remotely.
Requirements
- 5+ years of experience in site reliability, infrastructure, or backend engineering roles
- Familiar with modern observability practices and tools (e.g., Prometheus, Grafana, OpenTelemetry)
- Comfortable navigating large codebases and debugging production issues in complex systems
- You attempt to unblock yourself but know when to bring in help. You communicate well and share what youโve learned with others
- You're comfortable participating in team-level discussions, contributing helpful and relevant insights
- You bring a strong, experience-based point of view to technical discussions, but you're also willing to disagree and commit when needed
- You respond well to feedback and see it as an opportunity for growth
- Youโre a good verbal and written communicator
Responsibilities
- Provide engineering support for our customer support team, investigating and resolving production issues with empathy and speed
- Monitor infrastructure health through observability tools and metrics, proactively identifying and addressing potential issues
- Analyze and optimize slow database queries to improve system responsiveness and scalability
- Tune configuration settings across our platform to improve performance, reliability, and cost-efficiency
- Build and improve internal tooling to support deployment, monitoring, and developer productivity
- Bring a thoughtful approach to incident response, root cause analysis, and documentation of postmortems
Preferred Qualifications
Prior experience in these is helpful but not required. A strong ability to learn quickly and work across the stack is essential
Benefits
- Excellent health, dental, and vision coverage
- 401K
- Unlimited vacation days