Remote Senior MLOps Engineer

Logo of Tala

Tala

πŸ“Remote - Mexico

Job highlights

Summary

Join Tala's mission to unleash the economic power of the Global Majority by designing and implementing scalable infrastructure for AI/ML systems as a Senior Cloud Infrastructure Engineer.

Requirements

  • 4+ years of experience as a DevOps Engineer
  • 1 year of previous experience managing AI/ML infrastructure in public cloud environments
  • In-depth hands-on experience with at least one public cloud platform, preferably AWS
  • Experience with Python or any other programming language
  • Experience with Docker and Kubernetes in production
  • Experience with Continuous Deployment tools such as Jenkins or ArgoCD
  • Experience with Logging and Monitoring tools for SaaS such as Sumo, Splunk, Datadog, etc
  • Proficiency in English

Responsibilities

  • Design, build, and maintain scalable and robust infrastructure for AI/ML (Artificial Intelligence / Machine Learning) systems, including cloud-based environments, containerization, and orchestration platforms
  • Develop and implement CI/CD pipelines to automate the deployment, testing, and monitoring of AI/ML models and applications
  • Evaluate and integrate new tools, technologies, and frameworks to improve the efficiency and effectiveness of our MLOps processes
  • Design and manage Continuous deployment using Kubernetes, ArgoCD, and Jenkins
  • Maintain related container registry and model registry
  • Monitor infrastructure utilization and costs pertaining to model training, inference, and GPU utilization
  • Monitor and troubleshoot AI/ML systems to ensure high availability, performance, and reliability

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Tala know you found this job on JobsCollider. Thanks! πŸ™