Technical Support Engineer

Voltage Park
Summary
Join Voltage Park's Customer Experience team as a Technical Support Engineer and take charge of high-impact incidents, leading real-time responses, and ensuring clear communication with customers and internal stakeholders. You will be the go-to person for resolving critical issues, working at the intersection of engineering, data center operations, and customer needs. This role demands strong technical skills, excellent communication abilities, and the ability to collaborate effectively across teams. You will own incidents from detection to resolution, implement long-term solutions, and continuously improve support processes. The ideal candidate is calm under pressure, a problem-solver, technically sharp, customer-centric, and a team player. This position requires on-call availability for urgent incidents.
Requirements
- Track record of managing customer escalations and technical comms across all levels, from execs to engineers
- Proven ability to deliver complex systems or projects from 0 to 1
- Willingness and ability to participate in weekend on-call rotation
- Experience running or supporting infrastructure at scale (cloud, bare metal, or both)
- 5+ years as a Senior Linux Systems Administrator, Infrastructure Support Engineer, or Data Center Operations Lead
- Senior-level Linux system administration experience; able to operate confidently from the command line
- Scripting experience in Bash, Python, or JavaScript
- Experience diagnosing distributed training workloads and GPUs
- Familiarity with job schedulers like Slurm or Kubernetes
Responsibilities
- Serve as Incident Commander during outages and service degradation, leading response efforts across engineering and customer experience
- Own technical incidents from detection to resolution, driving urgency and accountability
- Communicate clearly with internal stakeholders and customers, keeping everyone aligned and informed
- Help implement long-term solutions to issues uncovered by root cause analysis
- Develop tools, documentation, and processes to improve incident response and support quality
- Partner closely with customers to understand their business, leveraging this knowledge to provide a personalized, consistent experience
- Continuously look for ways to improve the support experience, both human and technical
- Maintain on-call availability for urgent incidents β youβre ready to jump in when others need you most
Preferred Qualifications
- AI/ML infrastructure support experience β especially involving model training and orchestration
- Experience with cloud support, data center operations, or startup environments
- Strong documentation and process improvement skills
- Project management experience across technical and non-technical teams