Incident Response Manager

Stripe Logo

Stripe

πŸ“Remote - Worldwide

Summary

Join Stripe's Incident Ops team as an Incident Response Manager (IRM) and play a crucial role in driving incident resolution. Lead user-facing incidents across various domains, ensuring timely communication and remediation. Collaborate with cross-functional teams to improve incident handling processes. Develop your skills in incident management, communication, and technical understanding of Stripe's products and services. Contribute to a 24/7 global team dedicated to maintaining Stripe's high reliability. This role requires strong incident management experience and excellent communication skills.

Requirements

  • 3+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments
  • Demonstrated ability to independently lead multiple incidents concurrently with minimal support and guidance from senior team members
  • Basic understanding of application development, architectures, and cloud environments
  • Familiarity with infrastructure concepts, including physical, virtual, and container-based compute platforms
  • Practical experience using modern monitoring and telemetry tools such as Splunk Prometheus, and Grafana
  • Basic data analysis skills using SQL, Splunk or other tools
  • Strong task management skills, with attention to detail and ability to remain composed in high-pressure situations
  • Good written and verbal English communication skills, with the ability to translate complex technical issues for various stakeholders

Responsibilities

  • Act as an Incident Commander for incidents across various classes (reliability, technical, data privacy, product, or security), driving incident resolution with urgency and cross-functional collaboration
  • Lead all user-facing incidents across domains at Stripe - including reliability, technical, security, and data privacy
  • "User First" approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users
  • Update internal stakeholders and support decision-making processes during incidents
  • Participate in the root cause analysis process, conduct post-mortems for routine incidents, and identify remediations
  • Collaborate with engineering, product, and operations teams to improve incident handling processes and tooling
  • Contribute to team culture and processes that enhance incident response capabilities

Preferred Qualifications

  • Familiarity with different types of incidents such as technical, privacy, security, or crisis with eagerness to continually learn about Stripe's products and systems
  • Experience in conveying key details of technical issues to stakeholders
  • Experience with broad public-facing communications (e.g. status pages, tweets) and/or targeted communications (e.g. direct emails, support ticket responses)
  • Familiarity with distributed architectures and system inter-dependencies which operated in a cloud environment

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.