Summary

Join Oomnitza, a leading provider of Enterprise Technology Management platforms, as an experienced AI & ML Site Reliability Engineer. You will play a crucial role in architecting and maintaining the infrastructure supporting our AI and data-driven solutions. Responsibilities include building and maintaining our big data analytics platform, developing AI architecture, implementing vector databases and knowledge graphs, and optimizing systems for ML model deployment. You will collaborate closely with data scientists, engineers, and product teams to ensure our customers benefit from streamlined workflows and scalable AI systems. This position requires a Bachelor's degree, 5+ years of relevant experience, and proficiency in various AI/ML tools and technologies. Oomnitza offers a comprehensive benefits package, including health insurance, retirement benefits, remote work options, and professional development opportunities.

Requirements

Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field
5+ years of experience in site reliability engineering, dev ops, ML Ops or similar role
Experience with cloud platforms such as AWS, GCP, or Azure, including AI/ML services (e.g., SageMaker, Google Colab, Vertex AI)
Proficient in deploying machine learning models such as regressions, decision trees, neural networks, recommendations systems, etc., into production and managing model lifecycle
Experience with data processing tools such as Apache Spark, Hadoop, or Airflow for large-scale data processing
Experience with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch, LangChain, Hugging Face)
Strong understanding of vector databases (e.g., Pinecone, Milvus, Chroma) and knowledge graph tools (e.g., Neo4j, RDF)
Experience with RAG (Retrieval-Augmented Generation) techniques and GraphRAG systems
Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
Proficiency in programming languages such as Python, Bash, and experience with ML tools and libraries
Experience implementing CI/CD for ML pipelines and working with ML version control systems (e.g., DVC, MLflow)
Experience in on-call incident response in high-uptime environments
Intellectual curiosity with a hunger to know how things work and question established ideas, concepts and frameworks
Spirit of service: with a “how can I serve” attitude that is centered around delivering value to the greater team, the overall company, and for our broader community of customers
Ability to embrace ambiguity: and apply structured structured thinking and problem-solving skills
Entrepreneurial spirit with an enthusiasm to take on new challenges
Excellent communication and collaboration skills

Responsibilities

Build and maintain Oomnitza’s big data analytics platform that centralizes data from multiple customer instances and serves analytics and AI solutions
Design and build scalable, secure, and efficient AI infrastructure to support training and deploying machine learning models and AI software solutions
Implement and manage vector databases for storing high-dimensional data and knowledge graphs to integrate structured and unstructured data
Develop and integrate retrieval-augmented generation systems for more accurate, scalable, and context-aware models, including GraphRAG for advanced reasoning
Work with data scientists to train and optimize and fine-tune large language models (LLMs) for specific business applications and ensure seamless integration with existing systems
Deploy, manage, and monitor ML models in production, ensuring system reliability, scalability, and performance
Implement continuous integration and continuous deployment (CI/CD) processes tailored for machine learning, ensuring reproducibility and automation
Work with data scientists and the AI product management team to develop and manage AI agents for task automation, process optimization, and adaptive learning systems
Ensure model performance monitoring, retraining, and governance protocols are in place for reliable and ethical AI usage
Work closely with data scientists, ML engineers, and cross-functional teams to support development, testing, and deployment needs

Preferred Qualifications

Master’s degree in a related field
Understanding of model governance, ethics, and AI risk management
Experience with private LLM fine-tuning and optimization
Familiarity with agent development for automation tasks
Experience with AI/ML deployment models directly on edge devices, such as smartphones, IoT devices, or embedded systems
Knowledge of advanced data infrastructure, including distributed systems and database design

Benefits

Healthcare for dependents and spouse
A progressive, healthy work culture with excellent opportunities for professional and personal development
Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales
A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad
Dental & Vision Insurance
Employee equity plan
Health Insurance for your spouse and dependents
Pension, Life insurance and Income protection
Remote working & flexible work schedules
Working from home equipment allowance
Choice of preferred equipment, Mac or PC
Regular, fun social events and workshops

Ai And Machine Learning Site Reliability Engineer

Oomnitza

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Kontakt.io

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Waabi

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior