Site Reliability Engineer II
Earnest
Job highlights
Summary
Join Earnest, a company dedicated to making higher education more accessible and affordable. As a Site Reliability Engineer II, you will play a crucial role in ensuring the reliability and scalability of our systems. You will be responsible for setting up monitoring, developing IaC, implementing tools to measure SLOs/SLAs/SLIs, and automating infrastructure. This role requires 3+ years of experience in SRE or a similar field, hands-on experience with cloud providers (AWS), containerization, CI/CD, and observability tools. The position offers a competitive salary, remote work flexibility with monthly in-office collaboration, and a comprehensive benefits package including health insurance, retirement plan, paid time off, and more. Earnest fosters a culture of growth, humility, and ownership, making it an ideal environment for driven and collaborative individuals.
Requirements
- 3+ years of professional experience in Site Reliability Engineering or a similar role, with a focus on infrastructure, automation, and system reliability
- Hands-on experience with cloud providers (AWS), containerization (Kubernetes, Docker), CI/CD pipelines, and observability tools (e.g., Prometheus, Grafana or New Relic/Splunk)
- Willing to travel to the Oakland office monthly to engage with team members and strengthen collaboration
Responsibilities
- Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems
- Develop and manage IaC to ensure reliable, scalable, and high-performance systems, reducing configuration drift and enabling rapid recovery
- Implement and maintain both in-house and SaaS-based tools to measure SLOs, SLAs, and SLIs, ensuring we meet our reliability targets and provide transparency into system health
- Identify opportunities for automation across the infrastructure to minimize manual interventions, streamline operations, and improve response times
- Participate in on-call rotations, respond to incidents, conduct root cause analyses, and contribute to post-incident reviews to drive improvements
- Work closely with cross-functional teams to enhance system design, support code deployments, and optimize system performance
Preferred Qualifications
- Passionate about seeking opportunities to innovate and implement changes that enhance system reliability and client satisfaction
- Champions self-service infrastructure solutions to empower development teams and accelerate deployment cycles
- Embodies continuous improvement and is committed to driving projects beyond "good enough" toward operational excellence
- Proactively identifies potential issues and implements preventive measures to ensure consistent system uptime
- Able to clearly document processes and communicate with technical and non-technical stakeholders to ensure alignment
Benefits
- Health, Dental, & Vision benefits plus savings plans
- Mac computers + work-from-home stipend to set up your home office
- Monthly internet and phone reimbursement
- Employee Stock Purchase Plan
- Restricted Stock Units (RSUs)
- 401(k) plan to help you save for retirement plus a company match
- Robust tuition reimbursement program
- $1,000 travel perk on each Earnie-versary to anywhere in the world
- Competitive days of annual PTO
- Competitive parental leave
Share this job:
Similar Remote Jobs
- π°$144k-$189kπWorldwide
- πIndia
- πUnited Kingdom
- πUnited States
- πWorldwide
- πUnited States
- πWorldwide
- π°$126k-$178kπUnited States
- π°$170k-$240kπUnited States