Remote GPU Engineer

Logo of Reka AI

Reka AI

πŸ“Remote - United States, United Kingdom

Job highlights

Summary

The job is for a GPU Engineer at Reka, a globally distributed AI startup. The role involves improving large-scale training infrastructure and optimizing models' training performance. The ideal candidate has strong engineering skills in Python and PyTorch or other acceleration libraries, experience with low-level GPU code, scaling up GPU jobs via large-scale compute clusters, and proficiency in implementing robust monitoring systems for performance tracking.

Requirements

  • Strong engineering skills with fluency in Python and PyTorch or other acceleration libraries
  • Experience writing and debugging low-level GPU code (CUDA, Triton) and debugging hardware errors
  • Experience scaling up GPU jobs via large-scale compute clusters using Slurm or Kubernetes

Responsibilities

  • Design and implement improvements to our large scale training infrastructure
  • Help make technical decisions to optimize models' training performance and efficiency

Preferred Qualifications

  • Knowledge of advanced filesystems, particularly Ceph, and their integration and optimization in large systems
  • Proficient in implementing robust monitoring systems for performance tracking and anomaly detection

Benefits

  • 4 weeks paid leave
  • Healthcare benefits, including vision and dental
  • Visa support (such as H1B and OPT transfer for US Employees)
  • Open and inclusive work environment

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Reka AI know you found this job on JobsCollider. Thanks! πŸ™