Ai And Machine Learning Site Reliability Engineer
Oomnitza
Job highlights
Summary
Join Oomnitza, a leading provider of Enterprise Technology Management platforms, as an experienced AI & ML Site Reliability Engineer. You will play a crucial role in architecting and maintaining the infrastructure supporting our AI and data-driven solutions. Responsibilities include building and maintaining our big data analytics platform, developing AI architecture, implementing vector databases and knowledge graphs, and optimizing systems for ML model deployment. You will collaborate closely with data scientists, engineers, and product teams to ensure our customers benefit from streamlined workflows and scalable AI systems. This position requires a Bachelor's degree, 5+ years of relevant experience, and proficiency in various AI/ML tools and technologies. Oomnitza offers a comprehensive benefits package, including health insurance, retirement benefits, remote work options, and professional development opportunities.
Requirements
- Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field
- 5+ years of experience in site reliability engineering, dev ops, ML Ops or similar role
- Experience with cloud platforms such as AWS, GCP, or Azure, including AI/ML services (e.g., SageMaker, Google Colab, Vertex AI)
- Proficient in deploying machine learning models such as regressions, decision trees, neural networks, recommendations systems, etc., into production and managing model lifecycle
- Experience with data processing tools such as Apache Spark, Hadoop, or Airflow for large-scale data processing
- Experience with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch, LangChain, Hugging Face)
- Strong understanding of vector databases (e.g., Pinecone, Milvus, Chroma) and knowledge graph tools (e.g., Neo4j, RDF)
- Experience with RAG (Retrieval-Augmented Generation) techniques and GraphRAG systems
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
- Proficiency in programming languages such as Python, Bash, and experience with ML tools and libraries
- Experience implementing CI/CD for ML pipelines and working with ML version control systems (e.g., DVC, MLflow)
- Experience in on-call incident response in high-uptime environments
- Intellectual curiosity with a hunger to know how things work and question established ideas, concepts and frameworks
- Spirit of service: with a “how can I serve” attitude that is centered around delivering value to the greater team, the overall company, and for our broader community of customers
- Ability to embrace ambiguity: and apply structured structured thinking and problem-solving skills
- Entrepreneurial spirit with an enthusiasm to take on new challenges
- Excellent communication and collaboration skills
Responsibilities
- Build and maintain Oomnitza’s big data analytics platform that centralizes data from multiple customer instances and serves analytics and AI solutions
- Design and build scalable, secure, and efficient AI infrastructure to support training and deploying machine learning models and AI software solutions
- Implement and manage vector databases for storing high-dimensional data and knowledge graphs to integrate structured and unstructured data
- Develop and integrate retrieval-augmented generation systems for more accurate, scalable, and context-aware models, including GraphRAG for advanced reasoning
- Work with data scientists to train and optimize and fine-tune large language models (LLMs) for specific business applications and ensure seamless integration with existing systems
- Deploy, manage, and monitor ML models in production, ensuring system reliability, scalability, and performance
- Implement continuous integration and continuous deployment (CI/CD) processes tailored for machine learning, ensuring reproducibility and automation
- Work with data scientists and the AI product management team to develop and manage AI agents for task automation, process optimization, and adaptive learning systems
- Ensure model performance monitoring, retraining, and governance protocols are in place for reliable and ethical AI usage
- Work closely with data scientists, ML engineers, and cross-functional teams to support development, testing, and deployment needs
Preferred Qualifications
- Master’s degree in a related field
- Understanding of model governance, ethics, and AI risk management
- Experience with private LLM fine-tuning and optimization
- Familiarity with agent development for automation tasks
- Experience with AI/ML deployment models directly on edge devices, such as smartphones, IoT devices, or embedded systems
- Knowledge of advanced data infrastructure, including distributed systems and database design
Benefits
- Healthcare for dependents and spouse
- A progressive, healthy work culture with excellent opportunities for professional and personal development
- Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales
- A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad
- Dental & Vision Insurance
- Employee equity plan
- Health Insurance for your spouse and dependents
- Pension, Life insurance and Income protection
- Remote working & flexible work schedules
- Working from home equipment allowance
- Choice of preferred equipment, Mac or PC
- Regular, fun social events and workshops
Share this job:
Similar Remote Jobs
- 💰$198k-$270k📍United States
- 💰$103k-$154k📍Worldwide
- 💰$148k-$204k📍United States
- 💰$127k-$159k📍Worldwide
- 💰$158k-$269k📍Canada, United States
- 💰$180k-$260k📍United States
- 💰$124k-$160k📍Canada
- 💰$128k-$176k📍United States
- 💰$152k-$224k📍United States