Summary

Join Canvas Medical, a leading EMR and payments development platform, as an Applied AI Software Engineer. You will spearhead evaluations of AI agents, ensuring their performance, safety, and reliability in automating customer workflows. Leveraging state-of-the-art foundation models, you will design and execute rigorous evaluation experiments across various clinical and operational use cases. This role demands deep experience in evaluating LLM-based agents at scale, creating high-fidelity evaluations, and defining ground truth outcomes. Collaboration with product, ML engineering, and clinical teams is crucial to ensure the trustworthiness and robustness of Canvas's AI agents. You will also work with marketing to communicate the value of these agents to the broader developer community and market.

Requirements

5+ years of experience in applied machine learning or AI engineering, with a focus on evaluation and benchmarking
Proficiency with foundation model APIs and experience orchestrating complex agent behaviors via prompts or tools
Experience designing and running high-throughput evaluation pipelines, ideally including human-in-the-loop or expert-labeled benchmarks
Superlative Python engineering skills and familiarity with experiment management tools and data engineering toolsets in general including, yes, SQL and database management

Responsibilities

Design and execute large-scale evaluation plans for LLM-based agents performing clinical documentation, scheduling, billing, communications, and general workflow automation tasks
Build end-to-end test harnesses that validate model behavior under different configurations (prompt templates, context sources, tool availability, etc.)
Partner with clinicians to define accurate expected outcomes (gold standard) for performance comparisons in domains of clinical consequence, and partner with other subject matter experts in other non-clinical domains
Run and replicate experiments across multiple models, parameters, and interaction types to determine optimal configurations
Deploy and maintain ongoing sampling for post-deployment governance of agent fleets
Analyze results and summarize tradeoffs in clarity for product and engineering stakeholders, as well as for technical stakeholders among our customers and the broader market
Take ownership over internal eval tooling and infrastructure, ensuring speed, rigor, and reproducibility
Identify and recommend candidates for reinforcement fine-tuning or retrieval augmentation based on gaps identified in evals

Preferred Qualifications

Familiarity with clinical or healthcare data is a strong plus
Experience with reinforcement fine-tuning, model monitoring, or RLHF is a plus

Benefits

Competitive Salary & Equity Package
Health Insurance
Home Office Stipend
401k
Paid Maternity/Paternity Leave (12 weeks)
Flexible/unlimited PTO

Applied AI Software Engineer

Canvas Medical

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Pathway

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Docebo

Remote

Software Development

Mid-level

Exodus

Remote

Software Development

Senior