Senior Data Engineer

Tavus Logo

Tavus

📍Remote - United States

Summary

Join Tavus, a Series A company building the human layer of AI, as a Senior Data Engineer. You will own the entire data strategy, from sourcing and curating to structuring and optimizing data for high-quality AI models and products. This role requires a product-minded approach, anticipating future data needs and collaborating with ML engineers. You will be responsible for data hunting, mastering video data, optimizing labeling and automation, and leveraging internal data. The ideal candidate possesses a strategic mindset, extreme ownership, and technical expertise in Python, SQL, and large-scale data processing. Tavus offers a flexible work schedule, unlimited PTO, competitive healthcare and gear stipends, and a supportive team environment.

Requirements

  • You don’t just maintain - you build. From zero to fully running pipelines, you make things happen. You can take charge of how we use internal data to make smarter decisions
  • Extreme ownership - You own data strategy end-to-end, proactively solving what data we need, where to get it, and how to structure it for AI impact
  • Strategic mindset – You think beyond pipelines—you anticipate data needs before they arise and help shape AI development at Tavus
  • Automation expert – You know how to automate data cleaning, structuring, and labeling workflows for efficiency and scale
  • ML-first mindset – You understand that better data = better models and structure datasets to maximize AI model accuracy
  • Fast, but flawless. Speed matters, but so does accuracy. You balance both
  • You don’t follow best practices—you create them. A lot of what we’re doing is new- you set the standard for how data should be done
  • Technical expertise – You have strong experience with Python, SQL, and large-scale data processing tools

Responsibilities

  • Be a data visionary – Anticipate the data needs not just for today, but for the future. Curate diverse, high-quality datasets to ensure AI models reach their full potential
  • Influence AI model training – Directly impact AI model performance, efficiency, and inference accuracy by collaborating with ML engineers to optimize datasets for maximum AI effectiveness
  • Own the data, end-to-end – from sourcing to structuring—so it’s clean, scalable, and actually useful
  • Be a data hunter – Find, collect, and curate the best multimodal data (text, video, images) to power our models. Manage large-scale data procurement to ensure our models train on the highest quality information
  • Master video data – Own the challenges of AI-generated video, from proper classification and segmentation to structuring it for machine learning training. Ensure that our video datasets are structured for AI success
  • Optimize labeling & automation – Own the data labeling process and build automated workflows to make cleaning, labeling, and structuring data as efficient as possible. Work closely with our data annotation teams to ensure high-quality labeled data for ML models
  • Turn internal data into gold – Help unlock and use internal platform data to drive smarter decisions and supercharge growth
  • Speed + precision – Move fast, but don’t break data. Every pipeline, dataset, and workflow should be tight, efficient, and built to last

Preferred Qualifications

Previous work with LLMs, multimodal data, is a big plus. You know how to source, structure, and optimize data for real AI impact

Benefits

  • Flexible work schedule
  • Unlimited PTO
  • Extremely competitive healthcare
  • Gear stipends

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.