Senior Site Reliability Engineer II

Instacart Logo

Instacart

πŸ“Remote - Canada

Summary

Join Instacart's team as a Senior Site Reliability Engineer II and play a crucial role in maintaining the platform's operational backbone. You will ensure optimal performance and growth while fostering a culture of effective reliability. This role requires expertise in addressing complex issues and exploring innovative solutions. The Site Reliability Engineering (SRE) team focuses on optimizing systems, building robust infrastructure, and automating processes. You will work with a team that values intellectual curiosity, problem-solving, and collaboration. This is a remote position with a competitive compensation and benefits package.

Requirements

  • Proven experience in programming
  • Robust knowledge of incident management processes and tools
  • Exemplary troubleshooting and problem-solving skills
  • Ability to work under pressure and prioritize tasks during high-stress situations
  • Expertise in scaling application infrastructure for high availability

Responsibilities

  • Develop scalable infrastructure strategies to ensure high availability, that align infrastructure planning with product roadmaps, and optimize cost, risk and performance with cloud providers
  • Establish and lead incident management protocols and response plans to coordinate rapid responses, investigate root causes, prevent recurrence, and collaborate with security teams to test response readiness and address security risks
  • Continuously monitor performance metrics and trends to proactively identify reliability risks. Regularly refine SLOs, SLIs, and Error Budgets to align with evolving standards and leverage data insights to propose improvement plans and suggest architectural updates to enhance system reliability
  • Oversee regular system evaluations to pinpoint and refine process shortcomings and lead cross-functional projects that promote system optimization and minimize technical debt. Collaborate with product and engineering teams to ensure system enhancements align with user requirements
  • Design and deploy automation tools to streamline deployment and operations, ensuring seamless processes while overseeing the continuous enhancement of automation scripts and frameworks, and rigorously monitor automated systems for performance and reliability. Address and tackle issues in automated environments promptly to reduce disruptions
  • Provide technical guidance to junior colleagues, fostering a collaborative culture for problem-solving and innovation. Organize and lead knowledge-sharing sessions and coordinate training in site reliability best practices to enhance team proficiency

Preferred Qualifications

  • Proficient in Ruby or Go
  • Experience with cloud platforms (eg, AWS, GCP, Azure) and containerization (eg, Docker, Kubernetes)
  • Skill in risk assessment for foundational infrastructure changes
  • Experience in monitoring system performance and trend analysis

Benefits

  • Instacart provides highly market-competitive compensation and benefits in each location where our employees work
  • This role is remote
  • Additionally, this role is eligible for a new hire equity grant as well as annual refresh grants

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs