Summary
Join CoreWeave, a leading AI hyperscaler, as a skilled NVLink Engineer supporting large-scale data center deployments. You will be at the forefront of cutting-edge infrastructure technologies, ensuring optimal performance and stability of NVLink systems. Collaborate with global teams and customers, troubleshoot complex issues, and drive operational excellence. This role involves full lifecycle management of NVLink hardware and software, building automation tools, and participating in a rotating on-call schedule. CoreWeave offers a dynamic environment, solving complex problems and making a significant impact. Be part of a team tackling exciting industry challenges.
Requirements
- Solid understanding of networking fundamentals
- Proven background in troubleshooting network and server hardware at the component level
- Strong Linux system administration skills
- Proficiency in at least one language (e.g., Python, Go)
- Proven ability to troubleshoot and debug complex application issues
- Excellent communication and collaboration skills
- Experience with Ansible
Responsibilities
- Support the deployment of NVLink systems across large data center environments
- Support the full lifecycle management of NVLink hardware and software components
- Build and maintain tooling to automate and streamline the deployment, monitoring and troubleshooting workflows
- Diagnose and resolve performance, connectivity and stability issues in complex environments
- Collaborate with internal teams and external customers worldwide
- Participate in a rotating on-call schedule to ensure 24/7 support coverage
Preferred Qualifications
- Experience with InfiniBand networking
- Experience managing large-scale environments (1,000+ switches or nodes)
- Prior experience with NVLink technologies
- Knowledge of Redfish API for system management
- Experience with NVUE (NVIDIA User Experience)
- Background with SONiC
- Experience with Grafana/PromQL
Benefits
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.