Senior Manager, ML Platform
Stitch Fix
Job highlights
Summary
Join Stitch Fix's ML Platform team as a Senior Engineering Manager and lead the development and scaling of machine learning infrastructure. You will define the team's vision, mentor engineers, and build robust systems for model training and deployment. This high-visibility role requires collaboration with various teams and a deep understanding of ML lifecycle management. The position offers a competitive salary, benefits, and equity, and is fully remote. You will leverage your expertise in distributed systems, cloud technologies, and big data tools to empower data scientists and shape the future of machine learning at Stitch Fix. The ideal candidate possesses extensive experience in software engineering and team management, along with a proven track record of building scalable ML systems.
Requirements
- 10+ years of experience in software engineering, including 5+ years managing teams
- Proven track record of leading ML platform/infrastructure teams or building scalable, production-grade ML systems
- Strong understanding of ML lifecycle management, including model training, deployment, monitoring, and scaling
- Proficiency in distributed systems, cloud technologies (AWS preferred), K8S, and big data tools like Kafka, Flink, Spark, and Redis
- Strategic thinker with the ability to align technical priorities with business goals
- Demonstrated ability to grow and mentor diverse teams, fostering a culture of innovation and high performance
- Strong communication skills, capable of effectively engaging stakeholders and influencing decision-making
- A track record of working cross-functionally to solve challenging problems and deliver impactful results
- Deep understanding of the challenges faced by ML practitioners and a passion for building tools that empower them to innovate
- Experience building self-service tools and platforms for data scientists
- Background in developing ML-related APIs, frameworks, and data pipelines
- Hands-on experience with feature engineering, Model Training and Serving frameworks, or ML monitoring systems
Responsibilities
- Define and communicate a compelling vision for Stitch Fixβs ML Platform, aligned with company objectives
- Identify high-leverage opportunities to enhance platform capabilities and scalability, ensuring long-term success
- Build, mentor, and lead a high-performing team of engineers
- Foster a collaborative, inclusive team culture emphasizing growth, innovation, and accountability
- Drive the delivery of scalable, resilient systems and frameworks for ML model training and deployment
- Ensure the team operates efficiently, balancing new feature development with system reliability and performance improvements
- Partner with senior leaders in engineering, data science, and business teams to identify and address complex distributed system challenges
- Advocate for the needs of internal customers and ensure a seamless user experience on the ML Platform
- Provide hands-on technical leadership when necessary, ensuring best practices in system design, software development, and ML operations are upheld
- Continuously engage with data scientists, engineers, and business partners to understand their needs and ensure the ML Platform team delivers impactful solutions
Preferred Qualifications
- Familiarity with modern ML frameworks like PyTorch and Ray is a plus
- Advanced degree (MS/PhD) in Computer Science, Engineering, or a related field is a plus
Benefits
- Competitive salary
- Benefits
- Equity
- New hire and ongoing grants of restricted stock units
- Medical, dental, vision, and other benefits
- Fully remote position
Share this job:
Similar Remote Jobs
- πUnited States
- πBrazil, Argentina
- πUnited States
- π°$130k-$170kπUnited States
- πCanada
- π°$140k-$160kπUnited States
- πGermany
- π°$178k-$214kπUnited States
- π°$175k-$225kπUnited States