Senior Hardware Engineer

CoreWeave Logo

CoreWeave

πŸ’΅ $165k-$220k
πŸ“Remote - United States

Summary

Join CoreWeave, a leading AI hyperscaler, as a highly skilled GPU and PCIe troubleshooting Engineer. You will be a crucial part of the Hardware Engineering team, contributing to the design, development, troubleshooting, and optimization of server hardware infrastructure. Collaborate with cross-functional teams and vendors to deliver high-performance hardware solutions. This role requires expertise in GPU and PCIe technologies, automation, and server hardware management. CoreWeave offers a competitive salary, comprehensive benefits, and a hybrid work environment. The company is committed to fostering a collaborative and inclusive workplace.

Requirements

  • Prior experience supporting and troubleshooting data center class GPUs (preferably A100 or newer)
  • Proficiency in ansible/python and experience with programmatically interacting with server BMCs, using IPMI or Redfish (preferably Redfish)
  • Experience using, integrating and automating data center class GPU diagnostics and troubleshooting tools
  • In-depth knowledge of server hardware, components, and management technologies, particularly GPUs and PCIe devices
  • Proven ability to stay updated with the latest industry technologies and trends
  • Previous experience collaborating with hardware vendors
  • Strong passion for automation, with a commitment to automating processes comprehensively
  • Excellent documentation skills and attention to detail
  • Strong analytical and problem-solving abilities

Responsibilities

  • Troubleshoot complex GPU and PCIe related failures
  • Partner with external vendors on failure analysis
  • Track component RMAs
  • Develop and maintain hardware/firmware management services
  • Automate all aspects of the server hardware lifecycle
  • Serve as the senior point of contact for hardware escalation and troubleshooting
  • Collaborate with cross-functional teams to define hardware requirements, specifications, and system architecture
  • Create and maintain accurate documentation of hardware designs, specifications, test procedures, and results
  • Analyze and optimize the performance of hardware systems, identify bottlenecks, and propose improvements for enhanced efficiency
  • Establish processes for internal hardware testing, deployment, and performance optimization

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
  • Hybrid work environment
  • Remote work options for those not within 30 miles of an office

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.