Staff Software Engineer, Reliability Engineering at Airbnb

Summary

Join Airbnb's Site Reliability Engineering team as a Staff Software Engineer and develop, maintain, and improve the tools and systems that ensure reliable, scalable services. Collaborate with engineering teams to implement best practices, participate in incident response, and lead high-urgency incident management. Leverage your expertise in distributed systems, cloud computing, and software development to enhance operational efficiency and growth. As an essential member of the first responder SRE team, you will guide cross-functional teams during critical events, ensuring timely resolution and minimizing customer impact. This role requires strong technical skills, excellent communication, and a commitment to continuous learning. The position is US-Remote Eligible, with occasional office work or offsite attendance.

Requirements

Bachelor's degree in Computer Science or related field
9+ years of experience in software engineering or SRE roles, with a focus on large scale distributed systems
Strong coding skills in at least one programming language, such as Java, Python, or Go
Experience with distributed systems and service-oriented architectures
Experience with cloud computing platforms such as AWS or Google Cloud Platform
Strong conviction in software development best practices, including version control, automated testing, and continuous integration and delivery
Experience with containerization technologies such as Docker and Kubernetes
Excellent problem-solving and analytical skills, with a strong attention to detail
Ability to work effectively in a fast-paced and dynamic environment
Strong communication and interpersonal skills

Responsibilities

Design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting
Collaborate with other engineering teams to ensure services are designed with reliability in mind, and provide guidance on the appropriate use of tooling and automation
Identify opportunities to improve the reliability, scalability, and efficiency of our services and drive their implementation
Work with SREs to understand the challenges they face in operating our services and develop tools and systems to help them manage these challenges
Participate in incident response and post-mortems to identify and address systemic issues
Continuously evaluate new technologies and industry best practices to improve our SRE tooling and incident response procedures
Gain and maintain an intimate understanding of how the critical parts of the site work (services, infrastructure, tooling, and processes)
Lead high-urgency incident management and mentor less-experienced team members in effectively handling incidents
Contribute to better incident retrospectives, driving improvements in our overall reliability and incident response time

Benefits

Bonus
Equity
Benefits
Employee Travel Credits

Staff Software Engineer, Reliability Engineering

Airbnb

Summary

Requirements

Responsibilities

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Senior

Affirm

Remote

Software Development

Senior

growtherapy

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Aledade, Inc.

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level