Senior Site Reliability Engineer

Clover Health
Summary
Join Counterpart Health's engineering team as a Senior Site Reliability Engineer and contribute to building and maintaining a scalable infrastructure platform. You will build systems for application and infrastructure lifecycle management, troubleshoot problems, and contribute to team direction. This role requires strong programming skills, in-depth knowledge of containerization and orchestration technologies, and experience with public cloud platforms. You'll work collaboratively with cross-functional teams, fostering a healthy and productive environment. The ideal candidate enjoys working in a fluid environment, taking ownership of priorities, and tackling challenging problems. A competitive salary and comprehensive benefits package are offered.
Requirements
- Have 5+ years of programming experience and are proficient in at least one of the following programming languages: Python, Go, or Shell Script
- Have in-depth knowledge of containerization technology and orchestration, such as Docker, Containerd, and Kubernetes, as well as experience with CNCF-based technologies like Helm, gRPC, and Prometheus
- Have experience with public cloud platforms such as GCP, Azure or AWS
- Are knowledgeable in basic networking such as TCP/IP, UDP, firewall, routing, DNS, and load balancing
- Have experience with Linux system administration and basic knowledge of Linuxβs design
- Understand the key concepts in SRE such as monitoring, performance tuning, and automation
- Are able to work autonomously with limited guidance
- Have excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams and are able to adapt quickly to new challenges and technologies
Responsibilities
- Build systems for declarative application and infrastructure lifecycle management: continuous deployment, continuous integration, Kubernetes cluster management, service and workload inventory
- Prioritize and help troubleshoot problems, downtime, and alerts
- Contribute to setting the direction for the Site Reliability Engineering team, clearly establish goals that are aligned with Clover's company-level goals
- Foster a healthy, motivated, and inter-disciplinary culture that is the bedrock of high performing teams
- Simplify the process by automating the delivery pipeline and database changes
Preferred Qualifications
Kubernetes competency is highly valued
Benefits
- Competitive base salary and equity opportunities
- Performance-based bonus program
- 401k matching
- Regular compensation reviews
- Comprehensive medical, dental, and vision coverage
- No-Meeting Fridays
- Monthly company holidays
- Access to mental health resources
- Generous flexible time-off policy
- Remote-first culture
- Learning programs
- Mentorship
- Professional development funding
- Regular performance feedback and reviews