Summary
Join Flip.shop, a rapidly growing social commerce company, as a Senior Machine Learning Engineer specializing in Machine Learning Infrastructure. You will design, build, and optimize the infrastructure powering our AI-driven platform. This role involves ensuring efficient deployment, scaling, and monitoring of machine learning models, streamlining the development lifecycle, and creating scalable, production-level systems. You will collaborate with data scientists, machine learning engineers, and DevOps teams. The position requires a Bachelor's degree in a related field, 3+ years of experience, and proficiency in various technical skills. Flip.shop offers a competitive compensation and benefits package.
Requirements
- Bachelor's degree or higher in Computer Science or a related field, with 3+ years of experience in building scalable systems
- Proficiency in one or two programming languages (C/C++, Golang) within a Linux environment
- Solid understanding of GPU hardware architecture, GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis
- Experience in deep model inference/training, debugging, and tuning
- Familiarity with mainstream machine learning frameworks (e.g., TensorFlow, PyTorch, MxNet)
- Familiarity with MLOps practices
- Experience with big data frameworks (e.g., Spark, Hadoop, Flink) and resource management and task scheduling for large-scale distributed systems
- Experience in using or designing open-source machine learning lifecycle management systems like TFX
- Excellent logical analysis and problem-solving skills with the ability to abstract and decompose complex business logic
- Strong sense of responsibility, good learning ability, communication skills, and self-motivation, with the ability to respond and act quickly
- Good working document habits, with timely writing and updating of workflow and technical documentation
Responsibilities
- Design and implement scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments
- Design and implement machine learning systems for feeds, ads, and search ranking models
- Optimize the serving and training infrastructure of machine learning models
- Enhance the workflow for model training and serving, data pipelines, storage systems, and resource management within multi-tenancy machine learning systems
- Build tools to automate workflows for model training, testing, and deployment, ensuring that machine learning models can move quickly from development to production
- Ensure the infrastructure supports high-performance model inference at scale, with a focus on minimizing latency and maximizing throughput
- Work closely with data scientists, machine learning engineers, and DevOps teams to create seamless integration between development and production environments
- Build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services
- Implement best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data
Benefits
- Equity
- Bonuses
- Long term incentives
- A PTO policy
- Other progressive benefits
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.