Production Support Engineer

Amount
Summary
Join Amount's Production Support Engineering team and ensure efficient management of production issues. You will manage high-priority issues to resolution, troubleshoot and fix technical problems across multiple platforms, and interact with various teams to find optimal solutions. Responsibilities include managing ticket queues, monitoring for issues, post-release validation, and meeting partner SLAs. You will also document events, drive decision-making, ensure stakeholder updates, and perform root cause analysis. The role involves creating runbooks and SOPs, compiling metrics, and participating in on-call rotations for critical incidents. Amount offers a competitive salary, bonuses, equity grants, and benefits.
Requirements
- Technical ability to deep dive into issues by querying tables, analyzing data and problem-solving
- Prioritization and triage of incoming requests/issues
- Drive incident resolution and lead conversations with cross-functional groups
- Ask the right questions to help determine impact/priority and the correct route for resolution. Oversee a technical bridge, if required
- Technical and/or engineering background, ideally with experience writing SQL queries
- Experience working with development teams in a fast-paced environment
- 2 years of experience coordinating and executing major incidents, with demonstrated capacity to lead under pressure
- Previously collaborated with a wide spectrum of internal and external stakeholders
- Worked in an organization with a complex business environment
- Leadership skills with the ability to make quick decisions
- Familiar with ITSM/ITIL concepts
- You thrive being a self-starter, who can lead others during stressful situations
- Familiar with tools such as Confluence, Jira, and on-call management software such as PagerDuty and experience with error monitoring software (Sentry, Kibana)
Responsibilities
- Manage high-priority issues to resolution following industry best practices
- Troubleshoot, fix, and apply workarounds to resolve technical issues across multiple platforms
- Manage ticket queues, monitoring for issues and post-release validation
- Meet our partnerโs SLA requirements
- Management of all incidents through the incident management lifecycle
- Documentation of all relevant events, getting status reports while driving decision-making and resolution
- Ensure stakeholders are updated according to predefined service level agreements
- Completion and ownership of the postmortem with appropriate root cause analysis performed
- Improvement suggestions to capture preventative measures that will avoid recurrences of incidents
- Investigate patterns that indicate larger overall issues, even if we donโt have the solution
- Compilation of metrics on a weekly and monthly basis. Maintain dashboards for service incidents and ad hoc reporting as requested
- Play an active role during critical incidents which may occur outside of normal business hours. Nights, weekends, and holidays on an on-call rotation basis is a must
- Creation of runbooks or standard operating procedures (SOP) so we can all learn from each other and add to our knowledge base
Preferred Qualifications
Basic knowledge or interest of any programming language such as Java, Python or Ruby
Benefits
- $63,000-$73,000 base salary
- Amount employees are eligible for annual performance bonuses and equity grants as part of our commitment to shared success!