Production Support Engineer

Amount Logo

Amount

๐Ÿ’ต $63k-$73k
๐Ÿ“Remote - United States

Summary

Join Amount's Production Support Engineering team and ensure efficient management of production issues. You will manage high-priority issues, troubleshoot technical problems across multiple platforms, and interact with various teams to find optimal solutions. Responsibilities include incident management, documentation, stakeholder updates, postmortem analysis, and metric compilation. You will also contribute to process improvement and participate in on-call rotations. Amount offers a competitive salary and benefits package.

Requirements

  • Technical and/or engineering background, ideally with experience writing SQL queries
  • Experience working with development teams in a fast-paced environment
  • 2 years of experience coordinating and executing major incidents, with demonstrated capacity to lead under pressure
  • Experience collaborating with a wide spectrum of internal and external stakeholders
  • Experience working in an organization with a complex business environment
  • Leadership skills with the ability to make quick decisions
  • Familiarity with ITSM/ITIL concepts
  • Ability to thrive as a self-starter, who can lead others during stressful situations
  • Familiarity with tools such as Confluence, Jira, and on-call management software such as PagerDuty and experience with error monitoring software (Sentry, Kibana)

Responsibilities

  • Manage high-priority issues to resolution following industry best practices
  • Troubleshoot, fix, and apply workarounds to resolve technical issues across multiple platforms
  • Manage ticket queues, monitoring for issues and post-release validation, meeting partnerโ€™s SLA requirements
  • Deep dive into issues by querying tables, analyzing data and problem-solving
  • Prioritize and triage incoming requests/issues
  • Drive incident resolution and lead conversations with cross-functional groups. Ask the right questions to help determine impact/priority and the correct route for resolution. Oversee a technical bridge, if required
  • Manage all incidents through the incident management lifecycle
  • Document all relevant events, getting status reports while driving decision-making and resolution
  • Ensure stakeholders are updated according to predefined service level agreements
  • Complete and own the postmortem with appropriate root cause analysis performed
  • Provide improvement suggestions to capture preventative measures that will avoid recurrences of incidents
  • Investigate patterns that indicate larger overall issues, even if we donโ€™t have the solution
  • Compile metrics on a weekly and monthly basis. Maintain dashboards for service incidents and ad hoc reporting as requested
  • Play an active role during critical incidents which may occur outside of normal business hours. Nights, weekends, and holidays on an on-call rotation basis is a must
  • Create runbooks or standard operating procedures (SOP) so we can all learn from each other and add to our knowledge base

Preferred Qualifications

Basic knowledge or interest of any programming language such as Java, Python or Ruby

Benefits

  • Salary: $63,000-73,000 base salary
  • Benefits & Perks

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.