AKASA is hiring a
Sr. Site Reliability Engineer

Logo of AKASA

AKASA

πŸ’΅ $145k-$200k
πŸ“Remote - Worldwide

Summary

The job is for an SRE at AKASA, a fast-growing healthcare AI startup. The role involves managing and optimizing infrastructure, developing runbooks for incident management, building visualizations and alerting systems, troubleshooting production issues, and identifying potential issues before they become outages.

Requirements

  • Proficient in visualizing, monitoring, and alerting on telemetry data using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies
  • Experience with Docker, Kubernetes, Terraform, or similar technologies
  • 5+ years of professional experience using Python, Go, Java, or similar
  • Proficient with Linux and Unix Shell
  • Excellent collaboration and asynchronous communication skills
  • Committed to thorough documentation to streamline learning and processes
  • Proactive and enthusiastic attitude towards identifying and fixing issues
  • Ability to deliver quickly, iterate fast, and adapt to changing requirements
  • Proficient in using Git/GitHub for version control

Responsibilities

  • Lead an on-call rotation (PagerDuty) to respond to incidents impacting system availability
  • Dive deep into our application architectures and work with engineering teams on best practices for monitoring, reliability, and scalability
  • Manage our infrastructure using Terraform, GitHub CI/CD, and Kubernetes
  • Develop monitoring solutions that alert based on symptoms rather than outages
  • Document every action to turn findings into repeatable processes and automation
  • Enhance operational processes (such as deployments and upgrades) to ensure reliability and efficiency
  • Design, build, and maintain core infrastructure to support our applications effectively
  • Troubleshoot and resolve production issues across various services and levels of the stack
  • Strategically plan and scale AKASA’s monitoring

Preferred Qualifications

  • Experience with AWS (preferred), Google Cloud, or Azure
  • Understanding of networking principles and protocols
  • Knowledge of security best practices in infrastructure management
  • Experience in performance tuning and optimization

Benefits

  • Unlimited paid time off (PTO)
  • Expansive coverage for health, dental, and vision
  • Employer contribution to Health Savings Accounts (HSA)
  • Generous parental leave policy
  • Full employee coverage for life insurance
  • Company-paid holidays
  • 401(K) plan

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs

Please let AKASA know you found this job on JobsCollider. Thanks! πŸ™