Summary
Join Quora's Machine Learning Platform team and help build the company-wide ML development platform. This role focuses on ML infrastructure and distributed systems (80%), with some model deployment support (20%). You will design, develop, and maintain core infrastructure, build scalable distributed systems for serving ML models, optimize infrastructure performance, collaborate with ML engineers, and contribute to next-generation ML infrastructure. Your work will significantly impact Quora's long-term success. This is a remote-first position available in multiple countries. The team uses various algorithms and operates at a huge scale, impacting over a hundred million users monthly.
Requirements
- Availability for meetings and impromptu communication during Quora's β coordination hours " (Mon-Fri: 9am-3pm Pacific Time)
- Experience with building and owning end-to-end machine learning or data science-related systems
- Experience instrumenting ML workloads for performance monitoring/efficiency
- Experience with high performance, large scaled distributed systems
- 4+ years of industry experience in Machine Learning, Infrastructure or related fields
- 4+ years of experience writing production code in Python, C++, or similar language
- BS or MS in Computer Science, Engineering or a related technical field
Responsibilities
- Design, develop, and maintain the core infrastructure that powers Quora's machine learning platform, ensuring high availability, scalability, and performance
- Build scalable and reliable distributed systems for serving machine learning models
- Optimize infrastructure performance across the ML platform, identifying and resolving bottlenecks to meet the demands of large-scale machine learning workloads
- Collaborate with machine learning engineers to understand their infrastructure needs and provide solutions that enable them to build and deploy models efficiently
- Contribute to the design and implementation of our next-generation machine learning infrastructure, focusing on scalability, reliability, and cost-effectiveness
- Develop services on top of open source technologies like Kubernetes, Tensorflow, and PyTorch
- Own business-critical infrastructure, help resolve production issues, and participate in the team-wide on-call rotation
- Collaborate with ML engineers who use the platform, and help them be more impactful
Preferred Qualifications
- Strong communication and inter-personal skills, experience working with ML teams is a plus
- Experience working with Kubernetes, Docker, Terraform, or other forms of containerized infrastructure
- Hands-on experience with AWS technologies like EC2, EBS, S3, EKS
Benefits
- Medical/dental/vision coverage
- Equity refreshers
- Remote work reimbursement
- Paid time off
- Employee assistance programs
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.