Senior Software Reliability Engineer

Canva
Summary
Join Canva's Reliability Platform Group as a Reliability Engineer and help redefine how the world experiences design. Based in Sydney with options for flexible work arrangements across multiple Australian locations, you will design and implement processes, tools, and automation to improve service reliability. Collaborate with product engineering teams to ensure best practices are adopted across the organization. Foster a reliability-first culture, investigate production incidents, and research solutions for Canva's distributed cloud infrastructure. This role requires advanced coding proficiency, extensive experience with complex web applications, and a strong understanding of observability principles. Canva offers a range of benefits including equity packages, inclusive parental leave, wellbeing allowances, and flexible leave options.
Requirements
- Have advanced coding proficiency in Python/ Java/ GoLang and strong Object Oriented Programming fundamentals
- Have five-plus (5+) years of commercial experience working with developing complex, distributed web applications
- Have experience diagnosing and addressing issues across the βfull stackβ, including front-end code, backend, network / infrastructure and data layer
- Have solid understanding of observability principles, such as metrics, logs, tracing, synthetic testing, query construction, dashboarding and alerting
- Have experience with guiding others in the principles of incident review, investigation and remedial activity
- Have disciplined coding practices, experience with code reviews and pull requests, and a creative and conceptual problem-solving approach
- Have strong communication and team collaboration skills, both written and verbal
Responsibilities
- Design and implement processes, tools, automation, and libraries that service teams can use to improve the reliability of the services they own
- Work with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the whole organization
- Foster a culture within the Engineering org that puts reliability first and establishes processes and policies that drive reliability within product engineering teams
- Conduct a deep investigation into production incidents followed up by applying the learning to code
- Research, develop, and justify the best choices in the form of design docs for tools and processes that will shape the future of reliability at Canva
- Propose new approaches and solutions to ensure we future-proof Canvaβs distributed cloud infrastructure as we scale
- Participate in design meetings, hiring interviews, and code reviews
Preferred Qualifications
- Have experience in Java
- Have experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS)
- Have experience working with data warehouse, analytics and reporting tools such as Snowflake, Mode Analytics and Looker
Benefits
- Equity packages
- Inclusive parental leave policy
- An annual Vibe & Thrive allowance
- Flexible leave options