Research Scientist at Snorkel AI

Summary

Join Snorkel AI, a company democratizing AI through its data development platform, as a researcher on the Expert Data-as-a-Service (DaaS) team. You will conduct research on data curation and generation, collaborate with customer research teams to translate their goals into data requirements, design and prototype data generation pipelines, build evaluators to measure data quality, and write Python code for experiments and production pipelines. The role involves close collaboration with engineering, operations, and customer research teams. You will iterate rapidly on solutions based on feedback and evolving DaaS requirements. This position offers competitive compensation and equity opportunities within a growth-oriented environment.

Requirements

PhD. in Computer Science or a related field with focus on data centric AI and synthetic data generation
Strong foundation in large language models, generative AI, or data generation techniques, especially for supervised fine-tuning and reinforcement learning
Experience developing, experimenting with, and deploying AI models and data pipelines at scale
Solid programming skills in Python; familiarity with ML frameworks such as PyTorch, HuggingFace, etc. And familiarity with software engineering best practices and clean coding
Track record of working in fast paced, iterative environments and handling uncertainty in project requirements
Bias for action, comfortable rolling up your sleeves, experimenting, and iterating quickly to solve problems
Strong communication and collaboration skills, especially when working across research, engineering, and delivery teams

Responsibilities

Conduct research on data curation and generation to support emerging use cases across domains
Collaborate with customer research teams to translate their high-level goals into data requirements, and annotation guidelines and workflows
Design and prototype data generation and curation pipelines that feed directly into Data as a Service offerings
Build sophisticated evaluators to measure quality in our data, including coverage, bias, and utility
Write clear, maintainable Python code to support experiments and production pipelines; contribute to internal tooling and shared libraries
Iterate rapidly on solutions based on customer feedback, emerging research, and evolving DaaS requirements
Collaborate cross-functionally with delivery managers, vendors, and engineering teams to research to production

Preferred Qualifications

Past experience in data labeling, annotation, or curation projects
Publications or contributions related to data curation for LLM fine tuning
Knowledge of production workflows for DaaS offerings or data delivery teams
Familiarity with quality control processes for high volume data pipelines

Benefits

Comprehensive medical, dental, and vision plans for Snorkelers and their families
Yearly wellness stipend
401k program
Parental leave program lets new parents take up to 20 weeks of paid time off
Workstation setup allowance

Research Scientist

Snorkel AI

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Share this job:

Similar Remote Jobs

Remote

Data

Senior

Remote

Data

Mid-level

Remote

Data

Mid-level

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior

Canva

Remote

Data

Senior