Summary
Join Zeals, a rapidly growing tech company in Japan, and revolutionize hospitality on the internet through conversational commerce technology. As a Senior MLOps Engineer, you will play a key role in deploying, optimizing, and monitoring LLMs in production environments. You will build and maintain scalable pipelines, ensure low-latency inference, and implement best practices in monitoring and observability. This role involves working with state-of-the-art tools like Hugging Face and MLFlow. Zeals offers a fair, highly flexible, and inclusive working environment with opportunities for significant impact. The company has recently secured nearly $40 million in funding, demonstrating its potential for growth and expansion.
Requirements
- 5+ years of experience in MLOps, DevOps, or related fields, with a focus on deploying and managing LLMs or other large-scale machine learning models
- Proven experience with tools like Hugging Face, MLFlow, and containerization technologies (Docker, Kubernetes)
- Strong experience with cloud platforms (AWS, Azure, GCP) and infrastructure as code (Terraform)
- Hands-on experience in reducing inference latency and optimizing AI infrastructure
- Proficiency in Python, with experience in ML libraries such as TensorFlow, PyTorch, and related frameworks
- Expertise in CI/CD pipelines, version control (Git), and orchestration tools
- Familiarity with Generative AI, prompt engineering, and deploying models at scale
- Excellent problem-solving skills with the ability to tackle complex challenges independently
- Strong communication skills, with the ability to translate technical concepts for non-technical stakeholders
- A proactive mindset with a focus on continuous learning and staying updated with industry trends
Responsibilities
- Develop and maintain scalable pipelines for deploying LLMs, focusing on efficient, low-latency inference
- Utilize tools like Hugging Face and MLFlow for seamless model integration and version control
- Automate deployment processes, including model validation and continuous integration
- Implement comprehensive monitoring frameworks to track performance and reliability of models in production
- Use advanced observability tools to proactively detect and address performance issues
- Deploy alerting systems to ensure rapid response to anomalies in model behavior
- Architect and optimize cloud and on-premise infrastructure to support large-scale LLM operations
- Collaborate with cloud providers like AWS, Azure, and GCP to optimize costs and performance
- Work with backend engineers to ensure smooth integration of AI models into conversational platforms
- Partner with AI engineers and data scientists to align on project objectives and deployment strategies
- Document MLOps processes, best practices, and tools to maintain operational excellence
- Provide training and support to team members on MLOps methodologies and tools
Benefits
- Competitive
- Performance review: twice a year
- 10 days paid holidays during the first year, weekends off, national holidays, summer and winter break and refreshment leaves
- Visa support: We sponsor visas for the right candidates
- Highly flexible, remote-first organization (in Japan), Interim work from overseas
- Housing allowance (within 1.5KM away from office)
- Club activity allowance
- Team building allowance and Lunch allowance
- Zeals Bar (bi-monthly free flow beer party)
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.