
Senior Software Engineer

Liftoff Mobile
Summary
Join Liftoff's ML Reliability team as a Backend Engineer and contribute to building and maintaining robust, observable, and high-performing machine learning systems at scale. You will lead the design and evolution of large-scale ML infrastructure, ensuring availability, reliability, and operational excellence for production ML systems. This role involves defining and implementing end-to-end monitoring, alerting, and performance tracking for ML models and data pipelines, ensuring model health and data integrity. You will also partner with data scientists and platform teams to standardize and scale model deployment, versioning, and A/B experimentation frameworks. Additionally, you will lead and participate in incident response efforts, conducting root cause analysis and implementing corrective actions to prevent recurrence. This position offers the opportunity to identify systemic inefficiencies and drive cross-functional efforts to improve system performance and developer productivity. You will also drive adoption of best practices in software and ML engineering, including code quality, risk-driven testing, and explainable, maintainable systems. As a mentor and multiplier, you will help other engineers level up in ML systems, reliability, and architectural thinking. You will also contribute to strategic planning and partner with product and platform leads to align engineering efforts with business outcomes.
Requirements
- BS in Computer Science with 8+ years of professional experience; or
- MS in Computer Science with 6+ years of professional experience; or
- PhD with 3+ years of professional experience; software engineering, or reliability engineering, with a focus on production systems
- Proven ability to drive large technical initiatives and lead projects spanning multiple teams
- Solid core CS fundamentals (data structures, algorithms, architecting systems)
- Deep expertise in Python and/or Go; fluency with ML libraries (e.g., TensorFlow, PyTorch), cloud infrastructure (e.g., AWS)
- Experience with ML monitoring tools (e.g. Prometheus, Grafana)
Responsibilities
- Lead the design and evolution of large-scale ML infrastructure, driving improvements in availability, reliability, and operational excellence for our production ML systems
- Define and implement end-to-end monitoring, alerting, and performance tracking for ML models and data pipelines, ensuring model health and data integrity at scale
- Partner with data scientists and platform teams to standardize and scale model deployment, versioning, and A/B experimentation frameworks
- Lead and participate in incident response efforts, conducting root cause analysis and implementing corrective actions to prevent recurrence
- Identify systemic inefficiencies and opportunities for automation or simplification, and drive cross-functional efforts to improve system performance and developer productivity
- Drive adoption of best practices in software and ML engineering, including code quality, risk-driven testing, and explainable, maintainable systems
- Act as a mentor and multiplier, helping other engineers level up in ML systems, reliability, and architectural thinking
- Contribute to strategic planning and partner with product and platform leads to align engineering efforts with business outcomes
Preferred Qualifications
- Experience in big data engines such as Trino and Spark is a big plus
- Excitement to work on cutting-edge ML infrastructure at massive scaleβand to make it simple, reliable, and elegant
- Strong problem-solving skills and the ability to work collaboratively across teams
- Ability to lead across team and role boundaries to effect large scale change in culture and systems
- A healthy sense of fun!
- Experience in ML systems for training Transformer models, CTR prediction models
- Prior experience in AdTech, mobile growth, or performance marketing domains
- Contributions to open-source ML infrastructure or tools
Benefits
- Equity and health/vision/dental benefits associated with your country of residence
- Health and wellness stipends
- Medical benefits associated with your country of residence
Share this job:
Similar Remote Jobs

