Senior Staff Software Engineer, ML Training & Serving

Liftoff Mobile
Summary
Join Liftoff's Training & Serving Team as a Senior Staff Software Engineer and design, build, and maintain scalable infrastructure for training and serving deep learning models. Collaborate with ML teams to productionize neural network models, ensuring infrastructure meets evolving demands. Instrument and monitor system performance, proactively identifying bottlenecks. Profile and optimize ML model performance across the training and inference stack. Build internal tooling and automation to accelerate the ML experimentation cycle. Utilize AWS, PyTorch, PySpark, and in-house tools. This role offers full-time remote work (PST preferred) in select states and Canada, with quarterly in-person team gatherings. Liftoff provides a full compensation package including equity and health/vision/dental benefits.
Requirements
- Very strong coding ability (any backend language)
- Strong core CS fundamentals (data structures, algorithms, architecting systems)
- 12+ years of related experience with a Bachelorโs degree; 8+ years and a Masterโs degree
- Experience building Machine Learning tooling or platforms, in particular when applied to large scale problems
- A passion for quality and excellence, and the ability to temper it when necessary to ship
Responsibilities
- Design and build scalable, resilient infrastructure for training and serving deep learning models, ensuring millisecond-level latency and high availability under heavy traffic
- Collaborate with ML modeling and platform teams to productionize cutting-edge neural network models, ensuring infrastructure meets the demands of evolving model architectures and workloads
- Instrument and monitor system performance with detailed observability tooling, proactively identifying bottlenecks in model inference, data throughput, or distributed training
- Profile and optimize ML model performance across the training and inference stack
- Build internal tooling and automation to accelerate the ML experimentation cycle โ including experiment tracking, model versioning, reproducibility tools, and rapid feedback loops for researchers and engineers
- Utilize vendor-based products (AWS, etc.), open-source technologies (PyTorch, PySpark, etc.) and in-house tooling
Preferred Qualifications
- Experience with NVIDIA Triton, PyTorch, or PySpark is a big plus
- Experience working in high-growth startup atmosphere is a plus
- Previous experience in ad-tech
- Python, Golang
Benefits
- Health/vision/dental benefits
- Equity
Share this job:
Similar Remote Jobs
