Lead Engineer, Machine Learning Ops

Code and Theory
Summary
Join Code and Theory's Ai/ML engineering team as a Lead ML+DevOps Engineer and build software that solves real-world problems for diverse clients. You will design and implement MLOps pipelines, configure and manage cloud-based resources, automate model deployment, and collaborate with data scientists and engineers. This high-visibility role requires expertise in cloud deployment, containerization, and related technologies, ensuring the scalability and reliability of AI/ML infrastructure. You will monitor and optimize infrastructure performance, staying up-to-date with industry trends. The ideal candidate will have extensive experience in deploying machine learning models to cloud environments and strong expertise in Docker container orchestration.
Requirements
- Extensive experience in deploying machine learning models to cloud environments
- Strong expertise in Docker container orchestration
- Proficiency in Terraform for infrastructure as code (IaC) and cloud resource management
- Hands-on experience with streaming data platforms (e.g., Kafka, Kinesis)
- Solid understanding of data cleaning, transformation, and ETL processes
- Experience with CI/CD tools and pipelines (e.g., Jenkins, GitLab CI)
- Strong programming skills in Python
- Excellent problem-solving skills and the ability to think critically and creatively
- Strong communication skills with the ability to convey technical concepts to non-technical stakeholders
Responsibilities
- Design and implement MLOps pipelines to ensure consistency across the organization
- Configure and manage cloud-based resources (e.g., AWS, GCP, Azure) to support AI/ML workloads, leveraging containerization as needed
- Automate model deployment and management through scripts and tools to streamline the process
- Collaborate with data scientists and engineers to understand their requirements and develop tailored MLOps solutions
- Monitor and optimize AI/ML infrastructure performance by analyzing system performance and identifying bottlenecks
- Stay up-to-date with industry trends and best practices, applying this knowledge to improve our organization's MLOps capabilities
Preferred Qualifications
Familiarity with ML frameworks (e.g., TensorFlow, PyTorch)