Senior Chaos Engineer

Goodnotes Logo

Goodnotes

πŸ“Remote - China, United Kingdom

Summary

Join Goodnotes, a leading digital productivity platform, as their first Chaos Engineer. You will define and implement a comprehensive chaos engineering program from the ground up, designing and executing experiments to identify and mitigate system vulnerabilities. This role involves designing controlled failure scenarios, building automation tools, collaborating with engineering teams, and fostering a culture of reliability. You will work across mobile and backend systems, simulating real-world issues and analyzing system behavior under stress. The ideal candidate possesses proven chaos engineering experience in distributed systems, strong Swift programming skills, and a deep understanding of resilience patterns. This is a unique opportunity to shape the future of chaos engineering at Goodnotes.

Requirements

  • Proven experience with chaos engineering or fault injection, ideally in distributed, production-scale environments
  • Comfortable with iOS platforms, mobile networking, and understanding how client-side failures impact backend systems
  • Strong experience with Swift programming
  • Strong understanding of resilience patterns (e.g., circuit breakers, bulkheads, timeouts, retries) and system failure modes
  • Prior involvement in incident postmortems, war games, or reliability reviews
  • Comfortable building tools or scripts to automate chaos experiments and analyse system behavior under stress
  • With your scientific mindset, you love forming hypotheses, testing limits, and uncovering how systems really behave at the edge
  • You're excited to build a program from scratch, not just join one

Responsibilities

  • Define the chaos engineering strategy at Goodnotes, including tools, safety practices, and long-term roadmap
  • Design and run fault injection experiments across mobile and backend systems, targeting failure points in user flows, APIs, and infrastructure components.to surface hidden risks
  • Simulate real-world issues like latency spikes, dependency outages, cascading failures, and resource exhaustion
  • Build and scale tooling for automating experiments, tracking outcomes, and improving observability
  • Establish clear guardrails and blast radius controls to ensure experiments are safe, measured, and reversible
  • Collaborate across engineering teams to identify critical flows, formulate hypotheses, and stress-test assumptions
  • Facilitate resilience drills and chaos game days, driving cross-team engagement and response readiness
  • Document findings, communicate insights, translate chaos learnings into actionable improvements, and influence our engineering teams to enact recommended changes
  • Help shape the future of the chaos engineering function β€” including mentoring and hiring as the team grows

Benefits

  • Remote friendly
  • Flexible working hours and location
  • Medical insurance for you and your dependents
  • Great annual leave allowance
  • Meaningful equity in a profitable tech-startup
  • Budget for things like noise cancelling headphones, setting up your home office, personal development, professional training, and health & wellness
  • Sponsored visits to our Hong Kong or London office every 2 years
  • Company wide annual offsite
  • Fantastic maternity/ paternity packages and and allowances

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs