Summary

Join Blueprint, a company empowering therapists with AI-powered tools, as an experienced AI/ML Engineer. You will own the evaluation and quality of AI systems, designing evaluation infrastructure, defining quality metrics, and collaborating with cross-functional teams. This role involves building tools to track and improve model outputs, ensuring reliable and safe AI systems. You will work closely with engineering, product, and clinical leaders to define quality in practical terms and ensure consistent delivery. Your work will directly impact tens of thousands of therapists. This is a highly cross-functional, high-impact role in a remote-first company.

Requirements

You’re a hands-on ML/AI practitioner who’s passionate about building high-quality systems that actually get used — not just optimizing for benchmark scores
You’ve worked with LLMs in production at scale and know the hard part is making outputs reliable, human-aligned, and easy to evaluate
You’re motivated by impact, comfortable with ambiguity, and thrive in early-stage, fast-paced environments
You’ve built or owned evaluation infrastructure for LLMs or generative AI products
You have experience designing QA workflows, human-in-the-loop systems, or LLM-as-a-judge pipelines
You think in terms of feedback loops — and can turn fuzzy product goals into testable quality metrics
You write code, ship experiments, and are comfortable working across the stack to get the right signals flowing
You’re excited about working closely with product, design, and domain experts to define and refine what “good” means in a real-world AI application

Responsibilities

Design and build our end-to-end evaluation infrastructure: LLM-as-a-judge, human QA pipelines, offline scoring, and more
Define and implement application-specific quality metrics — not just accuracy, but tone, structure, clinical alignment, and more
Collaborate with product and clinical leads to turn subjective requirements into structured evaluation criteria
Monitor and analyze model performance across different therapist cohorts and workflows
Build tools and processes to capture in-the-wild feedback from clinicians and route it back into model and product improvement loops
Work closely with engineers to integrate eval into our CI, deployment, and iteration cycles
Help shape data labeling, prompt evaluation, experiment design, and prompt tuning frameworks

Preferred Qualifications

Experience in healthcare, mental health, or other high-trust environments
Familiarity with labeling, data QA, or prompt engineering at scale
A strong POV on eval tools, metrics, or best practices — and a willingness to invent new ones where needed

Benefits

Competitive salary and equity
100% remote – no office, no commuting
Health, dental, and vision insurance, with 75% of your premium covered by Blueprint
Semi-annual team gatherings (in Chicago!)
Unlimited PTO
Opportunities to grow with the company and shape our product
Hardworking, mission-driven, friendly coworkers

AI/ML Engineer

Blueprint

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

T-Rex Solutions, LLC

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Denmark in USA

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Wargaming

Remote

Software Development

Senior

Wargaming

Remote

Software Development

Senior

Wargaming

Remote

Software Development

Senior

Wargaming

Remote

Software Development

Senior

Remote

Software Development

Senior