Storage Engineer

Voltage Park Logo

Voltage Park

πŸ“Remote - Worldwide

Summary

Join Voltage Park, a company focused on making machine learning infrastructure accessible to all, as a Storage Engineer. You will be responsible for building, maintaining, and operating our customer-facing storage system. This fully remote role (continental US, PST hours) requires expertise in HPC storage systems, particularly VAST storage systems. You will own the storage system lifecycle, define SOPs, perform performance tuning, troubleshoot issues, and collaborate with other teams. The ideal candidate possesses strong experience in HPC storage, is proficient with various technologies, and thrives in a collaborative, autonomous environment. Sponsorship is not provided for this position.

Requirements

  • Proven experience in deploying and managing storage solutions for large-scale HPC infrastructures
  • Experience with VAST storage systems
  • Expertise in NFS, high-performance parallel file systems, and related storage networking technologies
  • Strong understanding of HPC architectures and storage performance optimization techniques
  • Experience with bare metal servers in a datacenter environment
  • Experience with Linux, Terraform, Ansible
  • Strong communication skills and the ability to collaborate effectively with technical and non-technical stakeholders
  • Experience architecting, building, and delivering complex systems from 0 to 1
  • Balances pragmatic development and ideal architectures
  • Effective at navigating tradeoffs between design, risk, cost, and outcomes

Responsibilities

  • Own the full lifecycle of a multi-petabyte, multi-datacenter VAST storage system
  • Define SOPs and runbooks for handling storage system events
  • Work on performance tuning, client optimization, and speed / reliability troubleshooting tasks
  • Optimize storage performance and scalability for large-scale GPU infrastructure
  • Collaborate with other engineers and teams to integrate storage solutions
  • Stay updated with the latest storage technologies and best practices in HPC
  • Be on-call for urgent system incidents

Benefits

This is a fully remote role, but you must be located in the continental US and available to work PST hours

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs