AI Data Lead

Clarifai Logo

Clarifai

πŸ“Remote - India

Summary

Join Clarifai as an AI Data Lead and own the end-to-end process of creating and curating high-quality datasets for our AI models. You will be a power user of Clarifai's data labeling products, providing feedback to improve them. Initially, focus will be on building next-generation vision datasets, particularly full-motion video, expanding later to language datasets for NLP models. Collaborate with ML and product teams, design data acquisition strategies, build and maintain data pipelines, and manage third-party labeling vendors. Implement a QA framework, act as a key internal customer for Clarifai's products, and mentor data labeling partners. Foster a culture of data excellence and communicate effectively with stakeholders.

Requirements

  • 3+ years in data engineering, with a proven history of building and managing complex data pipelines
  • Direct, hands-on experience managing third-party data labeling services or in-house annotation teams
  • Experience working with large-scale vision datasets (image or video)
  • Deep understanding of data labeling processes and quality metrics
  • Strong proficiency in Python and SQL
  • Experience with cloud data services (AWS, GCP, or Azure)
  • Exceptional project management, communication, and vendor management skills
  • A meticulous eye for detail and an unwavering commitment to data quality

Responsibilities

  • Collaborate with ML and product teams to define data requirements, starting with complex video and image use cases and expanding into text and language
  • Design and execute a comprehensive strategy for data acquisition and augmentation
  • Build, scale, and maintain robust data pipelines to ingest, process, and version large-scale multimedia datasets
  • Leverage Clarifai's automated and AI-assisted labeling tools to efficiently pre-label data and manage human-in-the-loop workflows
  • Serve as the primary lead for external data labeling vendors who will often verify or enrich AI-generated labels, ensuring projects are on time and within budget
  • Author crystal-clear labeling instructions for complex tasks, from object tracking in video to, eventually, named entity recognition in text
  • Implement and manage a rigorous quality assurance (QA) framework for both AI- and human-generated labels
  • Act as a key internal customer for Clarifai's data labeling products
  • Provide structured, expert feedback to our product and engineering teams to identify bugs, suggest feature enhancements, and guide the product roadmap
  • Continuously evaluate and pioneer new strategies for combining automated labeling with human verification to maximize quality and efficiency
  • Lead and mentor a focused set of data labeling partners
  • Foster a culture of data excellence, ownership, and continuous improvement
  • Communicate project status, challenges, and outcomes effectively to all stakeholders. Keep track of budgets

Preferred Qualifications

  • Specific experience with the complexities of full-motion video datasets and annotation (e.g., temporal consistency, event tagging)
  • Experience in an environment where you regularly used internal tools and provided feedback for their improvement ("dogfooding")
  • Experience with large-scale language or text datasets
  • Previous experience in a technical leadership or mentorship role
  • Experience using a variety of data annotation platforms and tools

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs