
Machine Learning Engineer

Summary
Join Reddit's Ads Training Platform pod as an engineer to design, build, and maintain large-scale distributed training infrastructure for Ads ML models. You will develop tools and frameworks on the Ray platform, build tools for debugging and profiling distributed training jobs, integrate with object storage systems, and collaborate with ML engineers. The role requires improving model training time, efficiency, and GPU training costs, as well as driving improvements in scheduling, state management, and fault tolerance. Reddit offers flexible work arrangements, including remote work options in countries with a physical Reddit presence. The company is committed to building a diverse and inclusive workforce and provides reasonable accommodations for qualified individuals with disabilities.
Requirements
- 3+ years in infrastructure/platform engineering or large-scale distributed systems
- 2+ years hands-on experience with Ray platform
- Strong understanding of distributed computing principles (task scheduling, fault tolerance, state management)
- Experience with distributed storage systems and large-scale data processing
- Proven ability to debug and profile distributed jobs
Responsibilities
- Design, build, and maintain large-scale distributed training infrastructure for Ads ML models
- Develop tools and frameworks on top of the Ray platform
- Build tools to debug, profile, and tune distributed training jobs for performance and reliability
- Integrate with object storage systems and improve data access patterns
- Collaborate with ML engineers to improve model training time, efficiency, and GPU training costs
- Drive improvements in scheduling, state management, and fault tolerance within the training platform to enhance overall performance
Preferred Qualifications
- Experience with deep learning frameworks (PyTorch, TensorFlow) is a big plus
- Bonus: model optimization for distributed training, Ads ML experience
Benefits
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k Match
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Flexible Vacation & Reddit Global Days off
- Generous paid Parental Leave
- Paid Volunteer time off
Share this job:
Similar Remote Jobs


