Ai Specialist

Manila Recruitment
Summary
Join our team as we lead the architecture and implementation of MLOps/LLMOps systems within OpenShift AI. You will establish best practices for scalability, reliability, and maintainability while contributing to open-source communities. Design and develop robust, production-grade features focused on AI trustworthiness, including model monitoring. Drive technical decision-making and define technical standards for model deployment, monitoring, and validation pipelines. Collaborate with product management and provide technical leadership in customer-facing discussions. Lead code reviews and mentor team members on MLOps best practices. Identify and resolve complex technical challenges in production environments. Partner with cross-functional teams to establish technical roadmaps and ensure alignment between engineering capabilities and product vision. Provide technical mentorship to team members, fostering a culture of engineering excellence.
Requirements
- 5+ years of software engineering experience, with at least 4 years focusing on ML/AI systems in production environments
- Strong expertise in Python, with demonstrated experience building and deploying production ML systems
- Deep understanding of Kubernetes and container orchestration, particularly in ML workload contexts
- Extensive experience with MLOps tools and frameworks (e.g., KServe, Kubeflow, MLflow, or similar)
- Track record of technical leadership in open source projects, including significant contributions and community engagement
- Proven experience architecting and implementing large-scale distributed systems
- Strong background in software engineering best practices, including CI/CD, testing, and monitoring
- Experience mentoring engineers and driving technical decisions in a team environment
Responsibilities
- Lead the architecture and implementation of MLOps/LLMOps systems within OpenShift AI, establishing best practices for scalability, reliability, and maintainability while actively contributing to relevant open-source communities
- Design and develop robust, production-grade features focused on AI trustworthiness, including model monitoring
- Drive technical decision-making around system architecture, technology selection, and implementation strategies for key MLOps components, with a focus on open-source technologies
- Define and implement technical standards for model deployment, monitoring, and validation pipelines, while mentoring team members on MLOps best practices and engineering excellence
- Collaborate with product management to translate customer requirements into technical specifications, architect solutions that address scalability and performance challenges, and provide technical leadership in customer-facing discussions
- Lead code reviews, architectural reviews, and technical documentation efforts to ensure high code quality and maintainable systems across distributed engineering teams
- Identify and resolve complex technical challenges in production environments, particularly around model serving, scaling, and reliability in enterprise Kubernetes deployments
- Partner with cross-functional teams to establish technical roadmaps, evaluate build-vs-buy decisions, and ensure alignment between engineering capabilities and product vision
- Provide technical mentorship to team members, including code review feedback, architecture guidance, and career development support while fostering a culture of engineering excellence
Preferred Qualifications
- Experience with Red Hat OpenShift or similar enterprise Kubernetes platforms
- Contributions to ML/AI open source projects, particularly in the MLOps/GitOps space
- Background in implementing ML model monitoring
- Experience with LLM operations and deployment at scale
- Public speaking experience at technical conferences
- Advanced degree in Computer Science, Machine Learning, or related field
- Experience working with distributed engineering teams across multiple time zones