Site Reliability Engineer

August Health Logo

August Health

๐Ÿ“Remote - United States

Summary

Join August Health as a Site Reliability Engineer and help scale and strengthen the foundation of our infrastructure while supporting a product that genuinely impacts people's lives. You will join a small team of engineers, working to bring stability, observability, and performance to our systems as we grow. In this role, youโ€™ll help improve our operational capabilities, optimize performance bottlenecks, and work closely with support and engineering teams to address issues. We are looking for someone who enjoys problem solving and infrastructure craftsmanship and who brings both technical insight and strong communication skills. We offer market competitive compensation based on experience and ability, including significant equity option grants, excellent health, dental, and vision coverage, 401K, and unlimited vacation days. The whole team works remotely.

Requirements

  • 5+ years of experience in site reliability, infrastructure, or backend engineering roles
  • Familiar with modern observability practices and tools (e.g., Prometheus, Grafana, OpenTelemetry)
  • Comfortable navigating large codebases and debugging production issues in complex systems
  • You attempt to unblock yourself but know when to bring in help. You communicate well and share what youโ€™ve learned with others
  • You're comfortable participating in team-level discussions, contributing helpful and relevant insights
  • You bring a strong, experience-based point of view to technical discussions, but you're also willing to disagree and commit when needed
  • You respond well to feedback and see it as an opportunity for growth
  • Youโ€™re a good verbal and written communicator

Responsibilities

  • Provide engineering support for our customer support team, investigating and resolving production issues with empathy and speed
  • Monitor infrastructure health through observability tools and metrics, proactively identifying and addressing potential issues
  • Analyze and optimize slow database queries to improve system responsiveness and scalability
  • Tune configuration settings across our platform to improve performance, reliability, and cost-efficiency
  • Build and improve internal tooling to support deployment, monitoring, and developer productivity
  • Bring a thoughtful approach to incident response, root cause analysis, and documentation of postmortems

Preferred Qualifications

Prior experience in these is helpful but not required. A strong ability to learn quickly and work across the stack is essential

Benefits

  • Excellent health, dental, and vision coverage
  • 401K
  • Unlimited vacation days

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.