Staff Site Reliability Engineer

Netlify Logo

Netlify

πŸ’΅ $122k-$166k
πŸ“Remote - Worldwide

Summary

Join Netlify's SRE team as a Staff Site Reliability Engineer and champion the architectural vision for reliability systems. You will foster cross-organizational reliability initiatives, cultivate technical standards, and act as a technical authority during major incidents. Collaborate with multiple engineering teams, mentor senior engineers, and design and implement reliability frameworks. Lead architecture reviews, develop reliability metrics, and cultivate relationships with key stakeholders. This remote-first role requires significant SRE experience and expertise in cloud architecture, CI/CD, configuration management, and database management.

Requirements

  • A significant history in Site Reliability Engineering or similar roles, with at least two years leading complex technical projects and mentoring senior engineers
  • Deep expertise in cloud architecture with hands-on experience designing and implementing solutions at global scale in providers such as AWS, GCP, or Azure
  • Proven track record of driving large-scale technical initiatives that span multiple teams and significant portions of infrastructure
  • Proven expertise in designing and managing CI/CD pipelines using tools such as Jenkins, GitLab CI, CircleCI, or similar
  • Deep expertise in configuration management using tools like Ansible, Chef, or Puppet, with a track record of implementing scalable configuration management solutions across large infrastructure footprints
  • Proficiency with Kafka or other messaging brokers, including deployment, scaling, and maintenance within multi-cloud environments
  • Strong experience in database management, including design, optimization, and maintenance of relational and/or NoSQL databases to support scalable and high-performance applications
  • Proficiency in programming and scripting languages like Python, Go, or Bash to develop automation solutions
  • Strong technical leadership skills with experience influencing engineering decisions across multiple teams without direct authority
  • Exceptional communication skills with experience presenting complex technical strategies to executive leadership and driving consensus among diverse stakeholders
  • Comprehensive understanding of reliability engineering principles and the ability to develop frameworks that help organizations make better reliability decisions
  • Experience establishing technical standards and best practices that have been successfully adopted across large engineering organizations
  • Understanding of security best practices and experience working with compliance frameworks including PCI, ISO 27001, HIPAA, or SOC certifications

Responsibilities

  • Champion the architectural vision and technical strategy for Netlify's reliability systems, making pivotal decisions that influence the entire platform's scalability, performance, and operational excellence
  • Foster cross-organizational reliability initiatives, collaborating with multiple engineering teams to implement large-scale infrastructure improvements and standardize SRE practices
  • Cultivate and set technical standards, best practices, and architectural patterns for reliability that will set the foundation for how teams across the organization construct and operate systems
  • Act as the technical authority during major incidents, making critical decisions about system trade-offs and providing guidance to multiple teams during complex outages
  • Cultivate and strengthen relationships with key stakeholders across Engineering, Product, and Executive teams to ensure reliability considerations are integrated into company-wide technical strategy
  • Mentor senior engineers and tech leads across multiple teams, helping them develop their systems thinking and reliability engineering capabilities while fostering a culture of operational excellence
  • Design and spearhead the implementation of reliability frameworks and tooling that can be adopted organization-wide, creating scalable solutions that elevate the entire engineering organization's capabilities
  • Lead architecture reviews and provide technical oversight for critical infrastructure projects, ensuring solutions meet both immediate requirements and long-term strategic goals
  • Develop and evangelize reliability metrics and SLO frameworks that align with business objectives, helping teams make data-driven decisions about reliability investments

Preferred Qualifications

Candidates based in the UK, Spain, or Poland

Benefits

  • Remote work, flexible hours
  • Equity plan
  • Base compensation for this role is targeted at Β£96,000 - Β£130,000 for most UK-based locations

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs