Staff Software Engineer, ML Performance Optimization

Zoox
Summary
Join Zoox's ML Platform team and lead ML Performance Optimization initiatives, making our Training and Inference platform for autonomous driving faster and more efficient. Collaborate with various ML teams at Zoox, including Perception, Prediction, Planner, Simulation, Collision Avoidance, and Advanced Hardware Engineering. Develop and execute a strategic vision for the team, leading the design, implementation, and operation of cutting-edge ML Training and inference performance optimization techniques. Enable engineers to grow their careers through mentorship and guidance. This role offers significant growth opportunities as Zoox expands robotaxi deployments and ventures into new ML domains. The team builds and operates the base layer of ML tools, deep learning frameworks, and inference systems used by applied research teams for in- and off-vehicle ML use cases.
Requirements
- Strong experience with training frameworks like PyTorch, leveraging GPUs efficiently for distributed model training
- Experience with GPU-accelerated inference using TensorRT, Ray Serve, or similar frameworks
- Experience using profiling tools like NVIDIA's Nsight or PyTorch's Profiler for identifying model training and serving bottlenecks
- Proficient in Python and C++
- Experience with model compression techniques to reduce model size and improve performance
Responsibilities
- Develop and execute a strategic vision for the ML Performance Optimization team to unlock ML innovation in autonomous driving and rider experience
- Lead the design, implementation, and operation of cutting-edge ML Training and inference performance optimization techniques
- Collaborate closely with x-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers, to define requirements and align on architectural decisions
- Enable the engineers in the team to grow their careers by providing technical guidance and mentorship
Preferred Qualifications
- 10+ years of total experience, including 4+ years of working on large-scale model training or inference platforms
- Excellent leadership skills with a demonstrated ability to lead high-performing engineering teams
Benefits
- Paid time off (e.g. sick leave, vacation, bereavement)
- Unpaid time off
- Zoox Stock Appreciation Rights
- Amazon RSUs
- Health insurance
- Long-term care insurance
- Long-term and short-term disability insurance
- Life insurance