Software Reliability Engineer

Canva Logo

Canva

πŸ“Remote - Australia

Summary

Join Canva's Observability Traces & Exceptions Team and help redefine how the world experiences design. Based in Sydney with options for remote work in other Australian locations, you will be responsible for building and improving the company's observability platform and tooling. This role requires proficiency in Python, Java, or Golang, deep knowledge of computer engineering fundamentals, and experience with AWS and Kubernetes. You will provide technical leadership, brainstorm solutions, and advocate for best practices in observability. Canva offers a range of benefits including equity packages, inclusive parental leave, a wellbeing allowance, and flexible leave options.

Requirements

  • Be proficient and happy to code in Python, Java or Golang
  • Have deep knowledge and understanding of Computer Engineering fundamentals and first principles
  • Have a solid knowledge of AWS (EC2, EKS, Lambda, SQS, Kinesis, S3) or equivalent
  • Have experience deploying and running containerized workloads on a platform like Kubernetes
  • Have experience with Observability Tooling – having competency with tools like Elasticsearch, Grafana, Sentry, Jaegar Tracing or similar
  • Have experience running highly available and reliable distributed systems, with highly scalable data stores
  • Be proficient with infrastructure-as-code - we’re a Terraform shop, but strong experience with other IaC tools will do the trick

Responsibilities

  • Be responsible for building and improving our observability platform and tooling, which is used by all Canva engineers
  • Provide technical leadership and expertise to drive pragmatic solutions and dive into impactful design decisions
  • Brainstorm, research and prototype to optimize our tracing platform, improve our operational effectiveness and increase reliability
  • Be proactive in improving the tracing user experience and advocating for best practices
  • Participate in team ceremonies, knowledge sharing and brainstorming sessions
  • Become an observability champion, evangelising best practices and guiding other Canvanauts in the observability space
  • Find ways to improve the use of traces and provide better insights to our engineers

Preferred Qualifications

  • Have experience with OpenTelemetry because it underpins a lot of the infrastructure and tooling that the team owns
  • Have experience writing application code in Java or frontend code in TypeScript, since we also maintain the tracing libraries
  • Have experience building and running monitoring infrastructure at scale. For example, Petabyte-scale Elasticsearch clusters or similar databases
  • Have experience with data handling at scale
  • Have experience with Clickhouse
  • Have experience with data security, data obfuscation and PII detection

Benefits

  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.