Manager, Incident Management

Caseware Logo

Caseware

πŸ“Remote - Canada

Summary

Join Caseware's Platform Engineering department as an experienced Incident Manager to build and lead a 24/7 on-call incident commander team. You will manage software incidents using systems like PagerDuty, NewRelic, AWS, Microsoft Teams, and Slack. Responsibilities include driving efficient incident resolution, ensuring proper team involvement, communicating effectively with stakeholders, and developing incident management processes. You will also track uptime metrics, conduct post-mortem meetings, implement proactive risk mitigation strategies, and foster a collaborative culture. This is a full-time permanent position offering flexible hybrid or fully remote work options for Canadian residents. Caseware provides competitive compensation, comprehensive benefits, and opportunities for career growth.

Requirements

  • Prior in a similar role, preferably within a software or technology company
  • Strong technical background with experience in incident management and response
  • Ability to drive teams to resolve incidents quickly
  • Understanding of software landscape and system integration
  • Excellent written and verbal communication skills
  • Ability to work effectively under pressure and manage multiple priorities

Responsibilities

  • Build and lead an incident management team to respond to software incidents
  • Manage a 24/7 on-call rotation to ensure timely incident response
  • Leverage systems such as PagerDuty, NewRelic, AWS, Microsoft Teams, and Slack to monitor and manage incidents
  • Drive teams to resolve incidents quickly and efficiently
  • Understand our software landscape and how systems and teams integrate
  • Ensure the right people are involved in an active incident to facilitate rapid recovery
  • Communicate effectively with cross-functional teams and end-user stakeholders to provide updates and resolutions
  • Develop and implement incident management processes and best practices
  • Track and provide uptime metrics to internal and external stakeholders, ensuring transparency in system reliability and incident recovery performance
  • Organize and run post-mortem meetings following major incidents. Document root causes, lessons learned, and actionable steps to improve processes. Follow up on action items to ensure their completion and track progress to prevent recurrence of similar incidents
  • Implement proactive strategies and tools to mitigate risks and strengthen system resilience

Benefits

  • Innovation is at our core. We work with cutting-edge technology in accounting and financial reporting, constantly pushing the boundaries to create impactful software solutions
  • We are committed to a collaborative culture, where your ideas are valued, and knowledge sharing is encouraged within a supportive, inclusive team
  • Work-life balance is important to us. We offer flexible work options, remote opportunities, and generous time-off policies to ensure a healthy work-life balance
  • We offer competitive compensation, including a competitive salary and comprehensive benefits such as health insurance and retirement plans
  • We are driven by impactful work . Your contributions directly affect how our clients manage financial processes and drive their success
  • Recognition and rewards matter to us . We celebrate hard work through recognition programs, performance bonuses, and opportunities for career growth
  • We embrace global opportunities . Work on international projects and collaborate with a diverse, global team

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.