Summary
Join Flip.shop, a rapidly growing social commerce company, as a Machine Learning Infrastructure Engineer. You will design, build, and optimize the infrastructure powering our AI-driven platform. This role involves ensuring efficient deployment, scaling, and monitoring of machine learning models. You will collaborate with data scientists and engineers to create seamless integration between development and production environments. The position requires expertise in infrastructure engineering, DevOps, and cloud platforms. You will contribute to building scalable, high-performance systems supporting real-time recommendations and driving business growth.
Requirements
- Deep expertise in infrastructure engineering, DevOps, or similar domains, with a focus on supporting machine learning workflows in production
- Strong proficiency in cloud platforms (AWS, GCP, or Azure), containerization (Docker, Kubernetes), CI/CD pipelines, and infrastructure-as-code tools (Terraform, Ansible)
- Experience working with machine learning frameworks (TensorFlow, PyTorch, or similar) and familiarity with MLOps practices
- Proven track record of optimizing infrastructure for performance, scalability, and reliability in production environments
- Strong teamwork skills, with the ability to partner with ML engineers and data scientists to streamline workflows
- Ability to communicate complex infrastructure solutions to technical and non-technical stakeholders
- Passion for solving infrastructure challenges that support real-time machine learning at scale
Responsibilities
- Design and implement scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments
- Build tools to automate workflows for model training, testing, and deployment, ensuring that machine learning models can move quickly from development to production
- Leverage cloud platforms to create efficient, scalable systems for large-scale machine learning workloads
- Ensure the infrastructure supports high-performance model inference at scale, with a focus on minimizing latency and maximizing throughput
- Work closely with data scientists, machine learning engineers, and DevOps teams to create seamless integration between development and production environments
- Build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services
- Implement best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data
Preferred Qualifications
Experience with SageMaker
Benefits
- Equity
- Bonuses
- Long term incentives
- A PTO policy
- Other progressive benefits
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.