Remote HPC Network Engineer

Logo of CoreWeave

CoreWeave

πŸ’΅ $160k-$210k
πŸ“Remote - United States

Job highlights

Summary

CoreWeave is seeking a highly skilled HPC Network Engineer to join their fast-growing team. The role involves monitoring, troubleshooting, supporting, deploying, and configuring InfiniBand fabrics. The ideal candidate should be proficient in InfiniBand configuration and management, network architectures, topologies, Linux system administration, and at least one scripting language. Preferred skills include experience with Nvidia UFM, SLURM job scheduler, Grafana, HPC systems architecture, MPI implementations, automation and configuration management tools such as Ansible, open-source technologies pertinent to HPC administration, and familiarity with various MPI implementations. The compensation ranges from $160,000-$210,000, and the role requires attendance at onboarding training in New Jersey with subsequent quarterly travel requirements of 1 week duration.

Requirements

  • Proficient in InfiniBand configuration and management
  • Solid understanding of network architectures, topologies, best practices, and techniques for high performance and availability
  • Familiarity with optical networking hardware
  • Experience in Linux system administration
  • Proficiency in at least one scripting language
  • Team player with effective collaboration skills
  • Ability to manage multiple tasks and projects concurrently

Responsibilities

  • Monitoring the performance and overall health of InfiniBand fabrics
  • Troubleshooting various issues that may arise within InfiniBand fabrics
  • Providing assistance and collaboration to other teams involved in the management and operation of HPC clusters utilizing InfiniBand technology
  • Help with installation of large fabrics, organizing and working with teams to bring up fabrics from day 0 to operational fabrics together with onsite personnel and customers
  • Work with configuration tooling, operations teams to carry out maintenance and upgrades of switches and the control plane of the fabrics

Preferred Qualifications

  • Hands-on experience with Nvidia UFM
  • Familiarity working with SLURM job scheduler
  • Experience or familiarity with Grafana for monitoring and visualization
  • Insight into HPC systems architecture and operational workflows
  • Familiarity with various MPI implementations
  • Experience with automation and configuration management tools such as ansible
  • Acquaintance with open-source technologies pertinent to HPC administration, including resource management, storage systems, monitoring infrastructure, software deployment, and continuous integration

Benefits

  • Medical, dental and vision insurance - 100% paid for the employee
  • Company paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Tuition Reimbursement
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our offices
  • Weekly massages in NJ office
  • A casual work environment
  • Work culture focused on innovative disruption

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let CoreWeave know you found this job on JobsCollider. Thanks! πŸ™