Senior Machine Learning Operations Engineer Computer Vision

Dandy
Summary
Join Dandy's rapidly growing Machine Learning team as a Senior MLOps Engineer and play a key role in the success of our team and company. You will be challenged to learn new technologies, establish best practices, and solve problems independently. We are creating next-generation experiences across the newly 3D-digitized dental stack with ML models, so our ML platform is critical to our success. As a Senior MLOps Engineer, you will be key to the development of our ML platform to create various state-of-the-art machine learning models to revolutionize the digital dental industry. You will collaborate with ML engineers and other stakeholders to design, implement, and maintain MLOps pipelines and cloud-based infrastructure. You will also develop and implement automation strategies and monitoring solutions to ensure high quality and compliance.
Requirements
- 5+ years of software experience and 3+ years MLOps engineering experience, preferably in a high growth startup environment
- Hands-on experience working with ML models for performance and training optimizations, hyperparameter tuning, model monitoring, evaluation, and benchmarking
- Familiarity with ML frameworks and libraries (e.g., TensorFlow, PyTorch, scikit-learn)
- Experience building and maintaining CI/CD pipelines with best practices
- Familiarity with containerization tools (e.g., Docker, Kubernetes) and orchestration platforms (e.g. Kubeflow)
- Comfort working in a highly agile, intensely iterative software development process
- Self-motivated, self-managing and takes ownership, with excellent organizational skills
- Ability to thrive in a remote-first organization
Responsibilities
- In collaboration with ML engineers, design and implement MLOps pipelines for 2D & 3D dataset curation, model training, evaluation, optimization, and deployment
- Manage and optimize cloud-based infrastructure for ML workloads, including scaling, resource allocation and cost management
- Help engineer information feedback loops to continuously improve our machine learning models
- Develop and implement automation strategies for model training, evaluation, optimization, and deployment to improve efficiencies
- Develop and manage monitoring solutions using GCP tools like Cloud Monitoring and Cloud Logging to track model performance, system health, and operational metrics
- Ensure that ML operations comply with data security and privacy regulations, utilizing security features and best practices
- Collaborate with other stakeholders within Engineering and Data to maintain a high bar for quality in a fast-paced, iterative environment
Preferred Qualifications
- 1+ years of experience working directly with machine learning model training and evaluation preferred
- Hands-on experience with one of the cloud platforms such as AWS, GCP or Azure. Experience with Google Cloud services (e.g., Vertex AI, BigQuery, Dataflow, Compute Engine, Kubernetes Engine) is preferred
Benefits
- Healthcare
- Dental
- Mental health support
- Parental planning resources
- Retirement savings options
- Generous paid time off