Senior Software Reliability Engineer

Canva Logo

Canva

📍Remote - Australia

Summary

Join Canva's Reliability Platform Group as a Reliability Engineer and help redefine how the world experiences design. Based in Sydney with options for flexible work arrangements across multiple Australian locations, you will design and implement processes, tools, and automation to improve service reliability. Collaborate with product engineering teams to ensure best practices are implemented across the organization, fostering a reliability-first culture. Investigate production incidents, research and develop solutions, and participate in design meetings and code reviews. This role requires advanced coding proficiency (Python/Java/GoLang), 5+ years of experience with complex web applications, and a strong understanding of observability principles. Canva offers equity packages, inclusive parental leave, wellbeing allowances, and flexible leave options.

Requirements

  • You have advanced coding proficiency in Python/ Java/ GoLang and strong Object Oriented Programming fundamentals
  • You have five-plus (5+) years of commercial experience working with developing complex, distributed web applications
  • You have experience diagnosing and addressing issues across the “full stack”, including front-end code, backend, network / infrastructure and data layer
  • You have solid understanding of observability principles, such as metrics, logs, tracing, synthetic testing, query construction, dashboarding and alerting
  • You have experience with guiding others in the principles of incident review, investigation and remedial activity
  • You have disciplined coding practices, experience with code reviews and pull requests, and a creative and conceptual problem-solving approach
  • You have strong communication and team collaboration skills, both written and verbal. As a reliability engineer, you will need to share the knowledge, communicate and coordinate changes across multiple service teams

Responsibilities

  • Designing and implementing processes, tools, automation, and libraries that service teams can use to improve the reliability of the services they own. For instance, adding a new long-awaited feature in our circuit breaker library
  • Working with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the whole organization. It’s not enough to create a new throttling library; we want to make sure it’s successfully used in every service
  • Fostering a culture within the Engineering org that puts reliability first and establishes processes and policies that drive reliability within product engineering teams. This includes things like SLAs, error budgets, on-call response, incident resolution, and observability best practices
  • A deep investigation into production incidents followed up by applying the learning to code
  • Researching, developing, and justifying the best choices in the form of design docs for tools and processes that will shape the future of reliability at Canva
  • Proposing new approaches and solutions to ensure we future-proof Canva’s distributed cloud infrastructure as we scale
  • Participating in design meetings, hiring interviews, and code reviews

Preferred Qualifications

  • Our services and libraries are primarily written in Java 13, so experience in Java is a nice to have
  • Our platform and infrastructure tooling is primarily written in Python, Go and Terraform
  • Experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS). We’re hosted on AWS and leverage the tools they provide as much as possible
  • Experience working with data warehouse, analytics and reporting tools such as Snowflake, Mode Analytics and Looker

Benefits

  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.