Site Reliability Engineer II

closed
Earnest Logo

Earnest

πŸ’΅ $155k-$175k
πŸ“Remote - United States

Summary

Join Earnest, a company dedicated to making higher education more accessible and affordable. As a Site Reliability Engineer II, you will play a crucial role in ensuring the reliability and scalability of our systems. You will be responsible for setting up monitoring, developing IaC, implementing tools to measure SLOs/SLAs/SLIs, and automating infrastructure. This role requires 3+ years of experience in SRE or a similar field, hands-on experience with cloud providers (AWS), containerization, CI/CD, and observability tools. The position offers a competitive salary, remote work flexibility with monthly in-office collaboration, and a comprehensive benefits package including health insurance, retirement plan, paid time off, and more. Earnest fosters a culture of growth, humility, and ownership, making it an ideal environment for driven and collaborative individuals.

Requirements

  • 3+ years of professional experience in Site Reliability Engineering or a similar role, with a focus on infrastructure, automation, and system reliability
  • Hands-on experience with cloud providers (AWS), containerization (Kubernetes, Docker), CI/CD pipelines, and observability tools (e.g., Prometheus, Grafana or New Relic/Splunk)
  • Willing to travel to the Oakland office monthly to engage with team members and strengthen collaboration

Responsibilities

  • Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems
  • Develop and manage IaC to ensure reliable, scalable, and high-performance systems, reducing configuration drift and enabling rapid recovery
  • Implement and maintain both in-house and SaaS-based tools to measure SLOs, SLAs, and SLIs, ensuring we meet our reliability targets and provide transparency into system health
  • Identify opportunities for automation across the infrastructure to minimize manual interventions, streamline operations, and improve response times
  • Participate in on-call rotations, respond to incidents, conduct root cause analyses, and contribute to post-incident reviews to drive improvements
  • Work closely with cross-functional teams to enhance system design, support code deployments, and optimize system performance

Preferred Qualifications

  • Passionate about seeking opportunities to innovate and implement changes that enhance system reliability and client satisfaction
  • Champions self-service infrastructure solutions to empower development teams and accelerate deployment cycles
  • Embodies continuous improvement and is committed to driving projects beyond "good enough" toward operational excellence
  • Proactively identifies potential issues and implements preventive measures to ensure consistent system uptime
  • Able to clearly document processes and communicate with technical and non-technical stakeholders to ensure alignment

Benefits

  • Health, Dental, & Vision benefits plus savings plans
  • Mac computers + work-from-home stipend to set up your home office
  • Monthly internet and phone reimbursement
  • Employee Stock Purchase Plan
  • Restricted Stock Units (RSUs)
  • 401(k) plan to help you save for retirement plus a company match
  • Robust tuition reimbursement program
  • $1,000 travel perk on each Earnie-versary to anywhere in the world
  • Competitive days of annual PTO
  • Competitive parental leave
This job is filled or no longer available

Similar Remote Jobs