Research Scientist

Snorkel AI Logo

Snorkel AI

πŸ’΅ $140k-$275k
πŸ“Remote - United States

Summary

Join Snorkel AI, a company democratizing AI through its data development platform, as a researcher on the Expert Data-as-a-Service (DaaS) team. You will conduct research on data curation and generation, collaborate with customer research teams to translate their goals into data requirements, design and prototype data generation pipelines, build evaluators to measure data quality, and write Python code for experiments and production pipelines. The role involves close collaboration with engineering, operations, and customer research teams. You will iterate rapidly on solutions based on feedback and evolving DaaS requirements. This position offers competitive compensation and equity opportunities within a growth-oriented environment.

Requirements

  • PhD. in Computer Science or a related field with focus on data centric AI and synthetic data generation
  • Strong foundation in large language models, generative AI, or data generation techniques, especially for supervised fine-tuning and reinforcement learning
  • Experience developing, experimenting with, and deploying AI models and data pipelines at scale
  • Solid programming skills in Python; familiarity with ML frameworks such as PyTorch, HuggingFace, etc. And familiarity with software engineering best practices and clean coding
  • Track record of working in fast paced, iterative environments and handling uncertainty in project requirements
  • Bias for action, comfortable rolling up your sleeves, experimenting, and iterating quickly to solve problems
  • Strong communication and collaboration skills, especially when working across research, engineering, and delivery teams

Responsibilities

  • Conduct research on data curation and generation to support emerging use cases across domains
  • Collaborate with customer research teams to translate their high-level goals into data requirements, and annotation guidelines and workflows
  • Design and prototype data generation and curation pipelines that feed directly into Data as a Service offerings
  • Build sophisticated evaluators to measure quality in our data, including coverage, bias, and utility
  • Write clear, maintainable Python code to support experiments and production pipelines; contribute to internal tooling and shared libraries
  • Iterate rapidly on solutions based on customer feedback, emerging research, and evolving DaaS requirements
  • Collaborate cross-functionally with delivery managers, vendors, and engineering teams to research to production

Preferred Qualifications

  • Past experience in data labeling, annotation, or curation projects
  • Publications or contributions related to data curation for LLM fine tuning
  • Knowledge of production workflows for DaaS offerings or data delivery teams
  • Familiarity with quality control processes for high volume data pipelines

Benefits

  • Comprehensive medical, dental, and vision plans for Snorkelers and their families
  • Yearly wellness stipend
  • 401k program
  • Parental leave program lets new parents take up to 20 weeks of paid time off
  • Workstation setup allowance

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.