Site Reliability Engineer
Voltage Park
πRemote - Worldwide
Please let Voltage Park know you found this job on JobsCollider. Thanks! π
Job highlights
Summary
Join Voltage Park, a company with a mission to make AI infrastructure accessible to all, as a Site Reliability Engineer. You will be responsible for building and operating core infrastructure, including bare metal provisioning, telemetry, storage, and container/VM orchestration. This fully remote role requires managing thousands of GPU servers and related infrastructure, contributing to the company's culture, and defining its success. You will collaborate with colleagues across various teams in a flat organization. The role requires significant experience with Linux, AWS, Kubernetes, and other technologies. Location in the US is required; visa sponsorship is not provided.
Requirements
- 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience
- 5+ years experience with AWS
- 2+ years experience with Kubernetes and strong container fundamentals
- 2+ years experience with Terraform and Ansible
- 2+ years with network attached storage management (via NFS, ceph, or other protocols)
- Experience working in a Slack-first, asynchronous remote work environment
- Experience with monitoring systems (Prometheus, ELK stack)
- Familiarity with the gitops workflow
- Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together
- Deep networking fundamentals
- Experience architecting, building, and delivering complex systems from 0 to 1
- Adept at balancing pragmatic development and ideal architectures. Effective at navigating tradeoffs between design, risk, cost, and outcomes
- Comfortable with navigating ambiguity
- Strong written and oral communication
Responsibilities
- At the direction of the Manager of Site Reliability Engineering, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features
- Deploy updates and improvements to support both Voltage Parkβs internal and end customer use cases
- Collaborate with colleagues in network engineering, software development, and customer support in a flat organization
- Participate in the SRE on-call rotation (1 week on, 5+ weeks off)
Preferred Qualifications
- Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware
- Experience with GPU servers, both in bare metal form or under virtualization
- Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls
- Experience with VAST storage systems
- Experience with VAST storage systems
- Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls
Benefits
Fully remote role
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- π°$204k-$281kπUnited States
- πJapan
- πUnited States
- π°$129k-$161kπCanada
- π°$159k-$239kπUnited States
- π°$60k-$120kπAsia
- π°$122k-$129kπCanada
- π°$148k-$204kπUnited States
- πArgentina
- πBrazil
Please let Voltage Park know you found this job on JobsCollider. Thanks! π