Technical Duty Officer, Incident Commander (SRE)

Xero
Summary
Join Xero's Incident and Problem Management team as a Site Reliability Engineer (SRE) professional. You will own the incident management process, providing expert leadership during critical outages and coordinating multiple teams for quick resolutions. This role involves leading the transformation to a world-leading SRE organization, promoting SRE principles, and developing scalable process frameworks. You will collaborate with product teams to analyze failures and improve service reliability. The position requires extensive experience in SRE, strong technical skills, and excellent communication abilities. Xero offers a comprehensive benefits package including generous paid leave, health insurance, life insurance, parental leave, and more.
Requirements
- Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
- Hands-on experience troubleshooting AWS hosted services
- Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues
- Coding experience (preferably Python) building tools, scripting, or automation
- Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions
Responsibilities
- Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department
- Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency
Benefits
- Very generous paid leave to use however you’d like (plus statutory holidays!)
- Dedicated paid leave to care for your physical and mental wellbeing
- An Employee Assistance Program to access mental health care for you and your family
- Health insurance
- Life insurance
- Income protection
- Wellbeing and sports programmes
- Employee resource groups
- 26 weeks of paid parental leave for primary caregivers
- An Employee Share Plan
- Beautiful offices
- Flexible working
- Career development