Infrastructure Operations Engineer

Voltage Park
Summary
Join Voltage Park, an enterprise AI factory, as a highly skilled Infrastructure Operations Engineer. This 24/7 role focuses on ensuring the stability, scalability, and performance of our compute, storage, and platform infrastructure. You will design, build, and roll out new platforms, deploy updates, and collaborate with various teams. The position offers full remote flexibility within the continental US and requires on-call rotation. The ideal candidate possesses strong technical skills, interpersonal abilities, and a passion for operational excellence. Unfortunately, we are unable to provide sponsorship for this role.
Requirements
- 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience
- 5+ years experience with AWS
- 2+ years experience with Kubernetes and strong container fundamentals
- 2+ years experience with Terraform and Ansible
- 2+ years with network attached storage management (via NFS, ceph, or other protocols)
- Experience working in a Slack-first, asynchronous remote work environment
- Experience with monitoring systems (Prometheus, ELK stack)
- Familiarity with the gitops workflow
- Software development experience using Python, Go, bash,Β or other languages for the purposes of automation & connecting systems & APIs together
- Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband
- Experience building and delivering complex systems
- Effective at navigating tradeoffs between design, risk, cost, and outcomes
- Comfortable with navigating ambiguity
- Strong written and oral communication
Responsibilities
- At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features
- Deploy updates and improvements to support both Voltage Parkβs internal and end customer use cases
- Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams
- Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position
Preferred Qualifications
- Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware
- Experience with GPU servers, both in bare metal form or under virtualization
- Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors
- Experience with VAST storage systems
Benefits
This position offers full remote flexibility, although candidates must be based in the continental US and available to work during PST hours
Share this job:
Similar Remote Jobs

