
Incident Commander

Penn Interactive
Summary
Join PENN Entertainment’s digital team as an Incident Commander and be the frontline for incidents, working with Application and Reliability teams to prevent future events. You will be responsible for all engineering and some non-engineering incidents (P1, P2, P3, and P4), classifying and documenting them, providing support, and driving resolution. This involves hierarchical and technical escalation, working with engineers on diagnosis, recovery, and root cause analysis. You will also drive improvements to service delivery and release processes. The role requires collaboration with various teams, including Command Support, Customer Support, Application teams, and SRE teams. You will lead initiatives to improve internal frameworks, processes, and tools, and deliver incident communications to stakeholders.
Requirements
- Experience in a similar role or incident management role
- Experience and understanding of Containerization (Docker & Kubernetes preferred)
- Comfortable within Linux environments and needs
- Experience working with AWS, GCP, and/or on-premise environments needs
- Ability to work independently and learn quickly with little supervision
- Ability to handle multiple projects simultaneously
- Willingness to drop everything and take on an ad-hoc task
- You’re the type of individual who is extremely tech-savvy and passionate about learning new technologies and tools
- A bachelor’s degree in computer science, engineering, and/or similar experience
Responsibilities
- Drive and enhance collaboration with other Command Support members and Commanders, Customer Support, Application teams, SRE teams and other cross-functional teams to lead real-time incident management
- Provides Leadership for developing Practices, Frameworks, Process Flows, Templates and Process Guides
- Continuously improve and enhance the internal framework, methodology, processes, and tools
- Developing and maintaining key practice capabilities
- Collaborating with SRE teams and Infrastructure teams to identify requirements
- Recommends innovative solutions that enable the organization to deliver on its objectives and goals
- Promote opportunities for Continuous Service Improvements
- Deliver incident communications to stakeholders via email, Slack, Microsoft Teams in timely manner
- Lead initiatives to promote JIRA Release Ticket management, quality and alignment with Incident management communication supporting SLAs
- Other duties as required
Preferred Qualifications
Postgres, MySQL, Elastic Search, Kafka, Redis, Helmfile, Terragrunt, Prometheus, and any web programming
Benefits
- Competitive compensation package
- Fun, relaxed work environment
- Education and conference reimbursements
- Parental leave top up
- Opportunities for career progression and mentoring others
Share this job:
Similar Remote Jobs

