Senior Site Reliability Engineer at Xero

Summary

Join Xero's Incident and Problem Management team as an experienced SRE professional to build, deliver, and maintain robust incident management processes and tooling. Drive enduring reliability through fast responses to high-severity incidents, build a world-class process, and lead technical discussions to identify and track actions during incidents. Deep dive into incident causes, proactively examine potential future incidents, and work with engineering teams to remove risks. Build playbooks and automation for quick responses and provide ongoing training. This role will be a Technical Duty Officer (TDO), driving fast mitigation and resolution of impactful events. The position requires strong technical skills, experience in SRE, and excellent communication abilities.

Requirements

Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
Strong coding experience (preferably with Python)
Hands-on experience troubleshooting AWS hosted services
Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues
Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions

Responsibilities

Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero
Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution
Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department
Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team
Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability
Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency

Benefits

Offering very generous paid leave to use however you’d like (plus statutory holidays!)
Dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family
Health insurance
Life insurance
And income protection
We offer wellbeing and sports programmes, employee resource groups
26 weeks of paid parental leave for primary caregivers
An Employee Share Plan
Beautiful offices
Flexible working
Career development
And many other benefits that reflect our human value

Senior Site Reliability Engineer

Xero

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Stack AV

Remote

DevOps

Senior

Remote

DevOps

Senior