AI/ML Engineer

Blueprint Logo

Blueprint

📍Remote - United States

Summary

Join Blueprint, a company empowering therapists with AI-powered tools, as an experienced AI/ML Engineer. You will own the evaluation and quality of AI systems, designing evaluation infrastructure, defining quality metrics, and collaborating with cross-functional teams. This role involves building tools to track and improve model outputs, ensuring reliable and safe AI systems. You will work closely with engineering, product, and clinical leaders to define quality in practical terms and ensure consistent delivery. Your work will directly impact tens of thousands of therapists. This is a highly cross-functional, high-impact role in a remote-first company.

Requirements

  • You’re a hands-on ML/AI practitioner who’s passionate about building high-quality systems that actually get used — not just optimizing for benchmark scores
  • You’ve worked with LLMs in production at scale and know the hard part is making outputs reliable, human-aligned, and easy to evaluate
  • You’re motivated by impact, comfortable with ambiguity, and thrive in early-stage, fast-paced environments
  • You’ve built or owned evaluation infrastructure for LLMs or generative AI products
  • You have experience designing QA workflows, human-in-the-loop systems, or LLM-as-a-judge pipelines
  • You think in terms of feedback loops — and can turn fuzzy product goals into testable quality metrics
  • You write code, ship experiments, and are comfortable working across the stack to get the right signals flowing
  • You’re excited about working closely with product, design, and domain experts to define and refine what “good” means in a real-world AI application

Responsibilities

  • Design and build our end-to-end evaluation infrastructure: LLM-as-a-judge, human QA pipelines, offline scoring, and more
  • Define and implement application-specific quality metrics — not just accuracy, but tone, structure, clinical alignment, and more
  • Collaborate with product and clinical leads to turn subjective requirements into structured evaluation criteria
  • Monitor and analyze model performance across different therapist cohorts and workflows
  • Build tools and processes to capture in-the-wild feedback from clinicians and route it back into model and product improvement loops
  • Work closely with engineers to integrate eval into our CI, deployment, and iteration cycles
  • Help shape data labeling, prompt evaluation, experiment design, and prompt tuning frameworks

Preferred Qualifications

  • Experience in healthcare, mental health, or other high-trust environments
  • Familiarity with labeling, data QA, or prompt engineering at scale
  • A strong POV on eval tools, metrics, or best practices — and a willingness to invent new ones where needed

Benefits

  • Competitive salary and equity
  • 100% remote – no office, no commuting
  • Health, dental, and vision insurance, with 75% of your premium covered by Blueprint
  • Semi-annual team gatherings (in Chicago!)
  • Unlimited PTO
  • Opportunities to grow with the company and shape our product
  • Hardworking, mission-driven, friendly coworkers

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.