Remote Senior Incident Response Manager
closedStripe
πRemote - Worldwide
Job highlights
Summary
Join Stripe's Incident Ops team as an Incident Response Manager (IRM) to drive incident response and management from detection to resolution, ensuring the company's five 9s reliability.
Requirements
- 5+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments
- Demonstrated ability to lead multiple incidents concurrently with authority and influence responders with agency and reasoning skills to resolve ambiguous problems and drive to root cause
- Strong full stack technical skills with development/support experience with cloud based technologies
- Demonstrated experience developing code and automation using Python, Ruby, JavaScript or shell scripting
- Solid understanding of infrastructure, including physical, virtual, and container-based compute platforms
- Strong quantitative, and analytical skills in data manipulation using SQL, Splunk or other tools
- Excellent task management skills, must be detail-oriented with ability to remain composed, methodical, and think fast in a high-pressured environment
- Exceptional written and verbal English communication skills, with the ability to translate complex technical issues for internal and external stakeholders
Responsibilities
- Act as an on-call Incident Commander, responsible for driving and managing incident resolution with a high level of urgency, cross-functional collaboration, and accuracy
- Lead all user-facing incidents across domains at Stripe - including reliability, technical, security, and data privacy
- User First' approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users
- Proactively update internal stakeholders, make decisions through data and influence by partnering with Engineering, Sales, Support and other cross-functional teams
- Contribute to the root cause analysis process while conducting post-mortems, remediations identification, and ensure problem management tasks meet SLA and user expectations
- Drive improvements in the incident handling process and incident management metrics and tooling based on trends and data of Stripe's incidents in collaboration with engineering, product and operations teams
- Collaborate closely with leadership for building team strategy based on the team vision
- Collaborate and coach other Incident Response Managers on the team
Preferred Qualifications
- Domain expertise in classes of incidents such as technical, privacy, security or crisis with a strong desire to continuously learn about Stripe's products, technical issues and systems
- Ability to review complex technical details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making
- Experience with broad user-facing communications (e.g. status pages, tweets) and/or targeted communications (e.g. direct emails, support ticket responses)
- Familiarity operating or managing distributed architectures with the ability to correlate system behaviors based on known inter-dependencies
- Demonstrated experience with full stack development and support
This job is filled or no longer available
Similar Remote Jobs
- πUnited States
- πCanada
- πWorldwide
- π°$225k-$250kπUnited States
- π°$181k-$266kπUnited States
- π°$158k-$205kπWorldwide
- π°$122k-$198kπUnited States
- πWorldwide
- π°$130k-$160kπUnited States
- πGibraltar