HPC NVLink Operations Engineer

CoreWeave
Summary
Join CoreWeave, a leading AI hyperscaler, as an NVLink operations engineer supporting large-scale data center deployments. You will be responsible for the deployment and lifecycle management of NVLink systems, diagnosing and resolving performance issues, collaborating with global teams and customers, and ensuring 24/7 support. This role requires a basic understanding of networking fundamentals, experience in troubleshooting network and server hardware, Linux system administration, and excellent communication skills. CoreWeave offers a competitive salary, comprehensive benefits including medical, dental, vision, life insurance, disability insurance, 401k, flexible PTO, and more. The company prioritizes a hybrid work environment, with remote work considered for candidates located far from an office. CoreWeave is committed to fostering an inclusive and supportive workplace.
Requirements
- Basic understanding of networking fundamentals
- Experienced in troubleshooting network and server hardware at the component level
- Linux system administration
- Ability to troubleshoot and debug complex application issues
- Excellent communication and collaboration skills
Responsibilities
- Support the deployment of NVLink systems across large data center environments
- Support the full lifecycle management of NVLink hardware and software components
- Diagnose and resolve performance, connectivity and stability issues in complex environments
- Collaborate with internal teams and external customers worldwide
- Participate in a rotating on-call schedule to ensure 24/7 support coverage
Preferred Qualifications
- Experience working in large-scale environments (1,000+ switches or nodes)
- Familiarity with Ansible
- Understanding of Redfish API for system management
- Experience with NVUE (NVIDIA User Experience) or similar network based CLI
- Experience with Grafana/PromQL
- Proficiency in at least one language (e.g., Python, Go)
Benefits
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption