Lead Engineer, Machine Learning Ops

Code and Theory
Summary
Join Code and Theory's Ai/ML engineering team as a Lead ML+DevOps Engineer. You will design and implement MLOps pipelines, manage cloud-based resources (AWS, GCP, Azure), automate model deployment, collaborate with data scientists, monitor infrastructure performance, and stay updated on industry trends. This high-visibility role involves working with internal and external clients to deliver scalable machine learning solutions for various applications. The ideal candidate will have extensive experience in deploying machine learning models to cloud environments and strong expertise in Docker and Terraform. Code and Theory is a global, remote-first agency with a diverse client base and a collaborative work environment.
Requirements
- Extensive experience in deploying machine learning models to cloud environments
- Strong expertise in Docker container orchestration
- Proficiency in Terraform for infrastructure as code (IaC) and cloud resource management
- Hands-on experience with streaming data platforms (e.g., Kafka, Kinesis)
- Solid understanding of data cleaning, transformation, and ETL processes
- Experience with CI/CD tools and pipelines (e.g., Jenkins, GitLab CI)
- Strong programming skills in Python
- Excellent problem-solving skills and the ability to think critically and creatively
- Strong communication skills with the ability to convey technical concepts to non-technical stakeholders
Responsibilities
- Design and implement MLOps pipelines to ensure consistency across the organization
- Configure and manage cloud-based resources (e.g., AWS, GCP, Azure) to support AI/ML workloads, leveraging containerization as needed
- Automate model deployment and management through scripts and tools to streamline the process
- Collaborate with data scientists and engineers to understand their requirements and develop tailored MLOps solutions
- Monitor and optimize AI/ML infrastructure performance by analyzing system performance and identifying bottlenecks
- Stay up-to-date with industry trends and best practices, applying this knowledge to improve our organization's MLOps capabilities
Preferred Qualifications
Familiarity with ML frameworks (e.g., TensorFlow, PyTorch) is a plus
Benefits
Remote-first approach