Incident Response Manager

closed
Stripe Logo

Stripe

πŸ“Remote - Worldwide

Summary

Join Stripe's Incident Ops team as an Incident Response Manager (IRM) and play a crucial role in driving incident resolution. Lead user-facing incidents across various domains, ensuring timely communication and remediation. Collaborate with cross-functional teams to improve incident handling processes. Develop your skills in incident management, communication, and technical understanding of Stripe's products and services. Contribute to a 24/7 global team dedicated to maintaining Stripe's high reliability. This role requires strong incident management experience and excellent communication skills.

Requirements

  • 3+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments
  • Demonstrated ability to independently lead multiple incidents concurrently with minimal support and guidance from senior team members
  • Basic understanding of application development, architectures, and cloud environments
  • Familiarity with infrastructure concepts, including physical, virtual, and container-based compute platforms
  • Practical experience using modern monitoring and telemetry tools such as Splunk Prometheus, and Grafana
  • Basic data analysis skills using SQL, Splunk or other tools
  • Strong task management skills, with attention to detail and ability to remain composed in high-pressure situations
  • Good written and verbal English communication skills, with the ability to translate complex technical issues for various stakeholders

Responsibilities

  • Act as an Incident Commander for incidents across various classes (reliability, technical, data privacy, product, or security), driving incident resolution with urgency and cross-functional collaboration
  • Lead all user-facing incidents across domains at Stripe - including reliability, technical, security, and data privacy
  • "User First" approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users
  • Update internal stakeholders and support decision-making processes during incidents
  • Participate in the root cause analysis process, conduct post-mortems for routine incidents, and identify remediations
  • Collaborate with engineering, product, and operations teams to improve incident handling processes and tooling
  • Contribute to team culture and processes that enhance incident response capabilities

Preferred Qualifications

  • Familiarity with different types of incidents such as technical, privacy, security, or crisis with eagerness to continually learn about Stripe's products and systems
  • Experience in conveying key details of technical issues to stakeholders
  • Experience with broad public-facing communications (e.g. status pages, tweets) and/or targeted communications (e.g. direct emails, support ticket responses)
  • Familiarity with distributed architectures and system inter-dependencies which operated in a cloud environment
This job is filled or no longer available