πUnited Kingdom, Spain
Staff Software Engineer
π΅ $230k-$322k
πRemote - United States
Please let Reddit know you found this job on JobsCollider. Thanks! π
Summary
Join Reddit's Machine Learning Platform team as a Staff Software Engineer, Training Platform! You will play a pivotal role in architecting, implementing, and maintaining foundational ML infrastructure. This role involves building systems and tools for machine learning engineers and data scientists, focusing on continuous improvement of the ML software development lifecycle. You will lead the development and maintenance of ML infrastructure, design high-performance solutions, and optimize for performance and cost-efficiency. Mentorship of team members is also a key aspect of this position. This fully remote-friendly role offers a competitive salary and a comprehensive benefits package.
Requirements
- 8+ years of work experience in a production software development environment or building data systems plus a degree in ML, Engineering, Computer Science, or other relevant discipline
- Experience with design and architecture of large scale ML Systems
- Experience with ML frameworks such as TensorFlow, PyTorch, or JAX
- Experience with training workflows, hyperparameter tuning, and resource optimization on CPU and GPU
- Experience with MLOps practices and tools such as Ray and MLFlow
- Hands-on experience with Kubernetes, Docker, or other container orchestration systems
- Experience building production-quality code incorporating testing, evaluation, and monitoring using object oriented programming, experience in: Python and/or golang
- Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems
Responsibilities
- Lead the building, testing, and maintenance of ML infrastructure at Reddit
- Propose, design, and implement high-performance ML platform solutions that significantly advance the deployment of models that serve millions of redditors a seamless experience for MLEs
- Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows
- Design and implement solutions that significantly advance the architecture of the ML Platform
- Analyze bottlenecks in distributed systems and optimize for performance and cost-efficiency
- Work with management on team goal setting, planning, and de-risk project execution
- Mentor other team members in adopting a rigorous DevOps approach to maintain and/or improve ML platform components and services health and quality
Benefits
- Comprehensive Healthcare Benefits
- 401k Matching
- Workspace benefits for your home office
- Personal & Professional development funds
- Family Planning Support
- Flexible Vacation (please use them!) & Reddit Global Wellness Days
- 4+ months paid Parental Leave
- Paid Volunteer time off
- Medical, dental, and vision insurance
- 401(k) program with employer match
- Generous time off for vacation
- Parental leave
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
πRomania
π°$244k-$304k
πUnited States
π°$204k-$259k
πUnited States
πSerbia
π°$192k-$260k
πUnited States
πSwitzerland
π°$165k-$200k
πWorldwide
πUnited Kingdom
π°$190k-$220k
πUnited States, Canada