Summary

Join Updater's Production Support team and revolutionize moving experiences through operational excellence. As an Associate Production Support Engineer, you will safeguard revenue-generating systems and evolve into a reliability engineering partner, preventing incidents. Collaborate with a dedicated team to transition from reactive incident response to proactive reliability engineering. Over the next year, your primary mission is to enhance the team's operational capabilities and master foundational elements for a reliability engineering role. You'll support the company's shift towards a proactive reliability engineering practice. This role involves a balance of operational excellence and growth responsibilities, focusing on incident management, communication, and SRE development.

Requirements

2+ years of experience troubleshooting production systems, networks, or applications
Understanding of fundamental programming concepts and basic scripting abilities
Experience with SQL and relational databases for data analysis and troubleshooting
Familiarity with API testing tools (Postman) and web service troubleshooting
Basic understanding of Git concepts and collaborative development workflows
AWS Certified Cloud Practitioner level knowledge or equivalent cloud platform experience
Proven ability to work effectively under pressure during system outages or critical incidents
Experience with ticketing systems and structured escalation procedures
Strong problem-solving skills with ability to analyze complex technical issues
Excellent time management and ability to prioritize multiple concurrent issues
Professional communication style with both technical and non-technical stakeholders
Experience providing status updates and technical explanations during incident response
Ability to write clear documentation and incident reports
Comfort participating in bridge calls and leading technical discussions
Bachelor's degree in Computer Science, Information Technology, Engineering, or related technical field
OR equivalent work experience (additional 2+ years of hands-on technical experience in lieu of degree)
2+ years of experience in technical support, system administration, DevOps, or related operational roles
Demonstrated ability to learn new technologies quickly and adapt to changing technical environments

Responsibilities

Monitor critical production systems that directly generate company revenue using DataDog dashboards and synthetic tests
Respond to incidents with speed and precision, following established escalation procedures to minimize business impact
Manage escalations from internal and external partner call center agents
Partner with Updater engineering, support teams, and our providers to resolve escalated issues
Participate in 24x7 on-call rotation, ensuring someone is always watching our systems
Triage and resolve production issues through JIRA workflows, maintaining clear communication with stakeholders
Lead incident response for production outages, coordinating across teams to restore service quickly
Document incidents thoroughly and participate in blameless postmortem processes that focus on system improvement
Communicate effectively with internal teams, external partners, and leadership during high-stress situations
Build relationships across the organization, assuming positive intentions and celebrating team successes
Learn to implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services
Develop basic automation scripts to reduce manual operational tasks (Python, Bash, AWS CLI)
Gain familiarity with Infrastructure as Code concepts using Terraform
Collaborate with the DevOps team to understand CI/CD pipelines and deployment automation
Contribute to developer self-service initiatives that reduce operational dependencies
Support the Platform Engineering mission by identifying opportunities to simplify and standardize operational processes
Help develop documentation and "golden path" procedures that enable teams to operate more independently
Participate in cross-team collaboration to advance platform maturity and developer experience
Learn observability best practices including custom metrics, alerting, and Real User Monitoring (RUM)

Preferred Qualifications

Basic familiarity with Infrastructure as Code concepts (Terraform)
Experience with monitoring and observability tools (DataDog, Prometheus, Grafana)
Understanding of containerization and Kubernetes fundamentals
Knowledge of CI/CD pipeline concepts and deployment automation
Experience with configuration management or automation tools
Previous experience in customer-facing technical support roles
Background in system administration or DevOps practices
Experience with incident management frameworks (ITIL, SRE practices)
Understanding of SLA/SLO concepts and reliability engineering principles

Benefits

Medical, Dental, and Vision Insurance
Unlimited PTO
13 paid company holidays annually
Updater Stock Options
401(k)
Commuter Benefits
Personal Wellbeing Subsidy
New Hire Subsidy
One Medical Membership
Short Term Disability Insurance
Supplemental Life Insurance
12 weeks of Primary Caregiver Parental Leave

Associate Production Support Engineer

Updater

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Entry Level

Share this job:

Similar Remote Jobs

Veeva Systems

Remote

Software Development

Entry Level

Remote

DevOps

Mid-level

Remote

Customer Service

Entry Level

Remote

Customer Service

Entry Level

Remote

Customer Service

Entry Level

Remote

Customer Service

Entry Level

Canonical

Remote

DevOps

Entry Level

Remote

DevOps

Mid-level

Remote

Customer Service

Senior