Engineering Member, Human Data

Logo of poolside

poolside

πŸ“Remote - United States

Job highlights

Summary

Join poolside, a remote-first AI company, as a Member of Engineering (Human Data) and lead the development and management of high-quality data labeling pipelines for large language models. You will build and manage an internal labeling team, collaborate with external vendors for crowdsourced data, and design scalable processes for data annotation. This critical role ensures our AI models are trained on top-tier data. You will optimize labeling pipelines, set up quality assurance processes, and work cross-functionally with researchers and engineers. The position requires experience in data labeling, managing vendors, and understanding data quality metrics. Poolside offers a fully remote work environment, generous vacation time, health insurance, and other benefits.

Requirements

  • Experience with designing and managing data labeling processes, with a strong emphasis on crowdsourcing solutions
  • 2+ years of experience in a technical role such as Data Engineer, Data Scientist, Technical Project Manager, or similar, ideally in machine learning/data-focused environments
  • Familiarity with managing vendors and crowdsourcing platforms to handle large-scale data labeling efforts
  • Strong understanding of data quality metrics such as accuracy, precision, recall, and F1 score
  • Proven ability to develop complex pipelines with multiple stages, particularly for data annotation and machine learning training
  • Ability to collaborate with technical teams and ensure labeling processes align with overall model development needs
  • Mandatory experience with crowdsourcing platforms (e.g., ScaleAI, Toloka, or similar) for data labeling
  • Strong problem-solving skills and ability to work independently in a fast-paced environment

Responsibilities

  • Design, develop, and implement scalable data labeling pipelines that integrate into model training workflows
  • Manage and expand the internal data labeling team to meet the company's growing needs
  • Collaborate with external vendors to source and manage crowdsourced data labeling efforts, ensuring timely and high-quality delivery
  • Monitor and improve labeling processes by conducting experiments, ensuring data quality, and optimizing performance across labeling projects
  • Set up metrics and QA processes to evaluate the quality of labeled data and continuously improve output
  • Work cross-functionally with researchers and engineers to align labeling pipelines with model training needs
  • Identify new tools and technologies to streamline labeling processes and increase efficiency

Preferred Qualifications

Experience with cloud platforms and tools such as AWS, GCP, Kubernetes, and CI/CD systems is a plus

Benefits

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs