Summary
Join AppOmni as a Senior Site Reliability Engineer (SRE) and ensure the reliability, scalability, and performance of our systems and infrastructure. You will monitor system availability, automate deployment and maintenance, and proactively identify optimization areas. Collaborate with the development team to establish service-level objectives and lead incident response and postmortem analysis. This role requires excellent communication skills and 5+ years of hands-on experience with Python or Golang. AppOmni offers a hybrid work model with hub cities in San Francisco & San Jose (CA), Denver (CO), Lexington (KY), and New York City (NY). We are committed to supporting our employees' financial, professional, and personal well-being.
Requirements
- Excellent technical and non-technical communication skills
- Prior Experience as an SRE or related disciple responsible for maintaining high availability of a cloud based application, troubleshooting performance bottlenecks, configuring monitoring and alerting, and conducting incident response in a blameless environment
- A knack for reducing manual toil tasks with automation and systematic thinking
- Prior experience working with CI/CD tools and processes, pipelines-as-code (GitHub Actions, CircleCI)
- At least 5+ years of hands-on experience with Python or Golang
- A solid background in configuration management and infrastructure-as-code(Terraform)
- Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.)
- Demonstrated knowledge with Container orchestration ( Kubernetes/GKE)
- Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns ( Helm, GitOps, Knative)
Responsibilities
- Ensure our systems and infrastructure's reliability, scalability, and performance
- Monitor system availability
- Implement automation for deployment and maintenance tasks
- Proactively identify areas for optimization
- Collaborate with the development team to establish and refine service-level objectives
- Drive incident response and postmortem analysis to minimize service disruptions
Preferred Qualifications
- Experience in FedRAMP or similar secure environments
- Expertise working within highly controlled environments containing sensitive information
- Experience designing and maintaining CI/CD pipelines using commercial solutions
- Experience working on and within GCP and/or AWS
Benefits
- Working remotely
- New hire home office/computer equipment stipend
- Generous paid time off
- Paid company holidays
- Paid floating holidays
- Paid parental leave
- Paid sick time
- Paid family leave for applicable states
- Health insurance - medical, dental, and vision with HSA option
- LifeWorks Employee Assistance Program
- Company-provided life insurance
- AD&D
- STD/LTD and additional supplemental life insurance options
- 401(k) and Roth retirement saving accounts
- A monthly wellness benefit reimbursement
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.