Senior MLOps Engineer

closed
Tala Logo

Tala

πŸ“Remote - Mexico

Summary

Join Tala's mission to unleash the economic power of the Global Majority by designing and implementing scalable infrastructure for AI/ML systems as a Senior Cloud Infrastructure Engineer.

Requirements

  • 4+ years of experience as a DevOps Engineer
  • 1 year of previous experience managing AI/ML infrastructure in public cloud environments
  • In-depth hands-on experience with at least one public cloud platform, preferably AWS
  • Experience with Python or any other programming language
  • Experience with Docker and Kubernetes in production
  • Experience with Continuous Deployment tools such as Jenkins or ArgoCD
  • Experience with Logging and Monitoring tools for SaaS such as Sumo, Splunk, Datadog, etc
  • Proficiency in English

Responsibilities

  • Design, build, and maintain scalable and robust infrastructure for AI/ML (Artificial Intelligence / Machine Learning) systems, including cloud-based environments, containerization, and orchestration platforms
  • Develop and implement CI/CD pipelines to automate the deployment, testing, and monitoring of AI/ML models and applications
  • Evaluate and integrate new tools, technologies, and frameworks to improve the efficiency and effectiveness of our MLOps processes
  • Design and manage Continuous deployment using Kubernetes, ArgoCD, and Jenkins
  • Maintain related container registry and model registry
  • Monitor infrastructure utilization and costs pertaining to model training, inference, and GPU utilization
  • Monitor and troubleshoot AI/ML systems to ensure high availability, performance, and reliability
This job is filled or no longer available

Similar Remote Jobs