AKASA is hiring a
Sr. Site Reliability Engineer
AKASA
π΅ $145k-$200k
πRemote - Worldwide
Please let AKASA know you found this job on JobsCollider. Thanks! π
Summary
The job is for an SRE at AKASA, a fast-growing healthcare AI startup. The role involves managing and optimizing infrastructure, developing runbooks for incident management, building visualizations and alerting systems, troubleshooting production issues, and identifying potential issues before they become outages.
Requirements
- Proficient in visualizing, monitoring, and alerting on telemetry data using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies
- Experience with Docker, Kubernetes, Terraform, or similar technologies
- 5+ years of professional experience using Python, Go, Java, or similar
- Proficient with Linux and Unix Shell
- Excellent collaboration and asynchronous communication skills
- Committed to thorough documentation to streamline learning and processes
- Proactive and enthusiastic attitude towards identifying and fixing issues
- Ability to deliver quickly, iterate fast, and adapt to changing requirements
- Proficient in using Git/GitHub for version control
Responsibilities
- Lead an on-call rotation (PagerDuty) to respond to incidents impacting system availability
- Dive deep into our application architectures and work with engineering teams on best practices for monitoring, reliability, and scalability
- Manage our infrastructure using Terraform, GitHub CI/CD, and Kubernetes
- Develop monitoring solutions that alert based on symptoms rather than outages
- Document every action to turn findings into repeatable processes and automation
- Enhance operational processes (such as deployments and upgrades) to ensure reliability and efficiency
- Design, build, and maintain core infrastructure to support our applications effectively
- Troubleshoot and resolve production issues across various services and levels of the stack
- Strategically plan and scale AKASAβs monitoring
Preferred Qualifications
- Experience with AWS (preferred), Google Cloud, or Azure
- Understanding of networking principles and protocols
- Knowledge of security best practices in infrastructure management
- Experience in performance tuning and optimization
Benefits
- Unlimited paid time off (PTO)
- Expansive coverage for health, dental, and vision
- Employer contribution to Health Savings Accounts (HSA)
- Generous parental leave policy
- Full employee coverage for life insurance
- Company-paid holidays
- 401(K) plan
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Jobs
- π°~$82k-$120kπSpain
- π°~$150k-$180kπUnited States
- π°~$150k-$222kπWorldwide
- π°$150k-$190kπUnited States
- π°$99k-$135kπCanada
- π°~$166k-$232kπUnited States
- π°$100k-$150kπUnited States
- π°$145k-$165kπWorldwide
- π°~$144k-$216kπCanada
Please let AKASA know you found this job on JobsCollider. Thanks! π