Senior Staff Software Engineer, Reliability Engineering at Airbnb

Summary

Join Airbnb's Site Reliability Engineering team as a Senior Staff Engineer and play a key role in developing and implementing a best-in-class enterprise-wide SRE program. You will drive the development of a long-term reliability strategy, ensuring the performance and reliability of Airbnb's infrastructure and products. Collaborate with engineering teams to provide tools and expertise for reliable services. As a senior technical individual contributor, you will solve broader technical challenges, lend expertise to specific teams, and contribute code and/or participate in architecture/design. A typical day involves developing roadmaps, designing SRE architecture, creating incident management processes, fostering the SRE model, and bringing a customer focus to reliability. You will also build partnerships, learn from incidents, mentor other engineers, and create a culture of reliability.

Requirements

BS, MS, or PhD in computer science, related field, or equivalent work experience
12+ years of software engineering experience, with a significant portion dedicated to system architecture and design in consumer-facing technology companies
Strong leadership skills, with 5+ years of experience as a senior-level technical lead or architect, driving the technical direction and strategy across multiple teams or projects
Excellent communication and collaboration skills, with a proven track record of working effectively across teams and organizations
Demonstrated expertise in building and scaling high-availability systems and platforms, with a deep understanding of multi-cloud environments

Responsibilities

Develop a roadmap with a longer-term vision for Reliability and serve as a strategic thought partner within the organization
Design, implement and influence company-wide SRE architecture, innovation, engineering, and standards
Create incident management processes that can scale with the organization as it continues its rapid growth. Assess how the organization manages incidents and responds to them; reduce operational toil stemming from incident management
Foster the SRE/Reliability model that takes into consideration the nuances of an engineering culture that has a great sense of ownership over their services
Bring a strong customer focus to the Reliability function, centered on optimizing the infrastructure and platform, and ensuring systems are highly available and performant
Develop Production Readiness standards to ensure service reliability. Automate as much as possible and always configure as code. Predict future failures and work proactively to mitigate them. Advocate and implement reliable design patterns (circuit breakers, graceful degradation, etc.)
Create a culture where Reliability is a state of mind, instilling a proactive approach to seeing patterns and opportunities to increase leverage and tooling
Build deep partnerships with engineering leaders. Work closely with product engineering teams on design and implementation choices of large-scale distributed systems
Partner with the broader organization to learn from incidents through a blameless post mortem process
Mentor and lead other Site Reliability Engineers. Uplevel and support others with servant leadership, mentorship, advocacy, and allyship

Benefits

This role may also be eligible for bonus, equity, benefits, and Employee Travel Credits

Senior Staff Software Engineer, Reliability Engineering

Airbnb

Summary

Requirements

Responsibilities

Benefits

Remote

Software Development

Senior

Share this job:

Similar Remote Jobs

growtherapy

Remote

Software Development

Senior

Affirm

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Affirm

Remote

Software Development

Senior

Wing

Remote

Software Development

Senior

Instacart

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior