Applied AI Software Engineer

Canvas Medical Logo

Canvas Medical

πŸ’΅ $300k-$400k
πŸ“Remote - United States

Summary

Join Canvas Medical, a leading EMR and payments development platform, as an Applied AI Software Engineer. You will spearhead evaluations of AI agents, ensuring their performance, safety, and reliability in automating customer workflows. Leveraging state-of-the-art foundation models, you will design and execute rigorous evaluation experiments across various clinical and operational use cases. This role demands deep experience in evaluating LLM-based agents at scale, creating high-fidelity evaluations, and defining ground truth outcomes. Collaboration with product, ML engineering, and clinical teams is crucial to ensure the trustworthiness and robustness of Canvas's AI agents. You will also work with marketing to communicate the value of these agents to the broader developer community and market.

Requirements

  • 5+ years of experience in applied machine learning or AI engineering, with a focus on evaluation and benchmarking
  • Proficiency with foundation model APIs and experience orchestrating complex agent behaviors via prompts or tools
  • Experience designing and running high-throughput evaluation pipelines, ideally including human-in-the-loop or expert-labeled benchmarks
  • Superlative Python engineering skills and familiarity with experiment management tools and data engineering toolsets in general including, yes, SQL and database management

Responsibilities

  • Design and execute large-scale evaluation plans for LLM-based agents performing clinical documentation, scheduling, billing, communications, and general workflow automation tasks
  • Build end-to-end test harnesses that validate model behavior under different configurations (prompt templates, context sources, tool availability, etc.)
  • Partner with clinicians to define accurate expected outcomes (gold standard) for performance comparisons in domains of clinical consequence, and partner with other subject matter experts in other non-clinical domains
  • Run and replicate experiments across multiple models, parameters, and interaction types to determine optimal configurations
  • Deploy and maintain ongoing sampling for post-deployment governance of agent fleets
  • Analyze results and summarize tradeoffs in clarity for product and engineering stakeholders, as well as for technical stakeholders among our customers and the broader market
  • Take ownership over internal eval tooling and infrastructure, ensuring speed, rigor, and reproducibility
  • Identify and recommend candidates for reinforcement fine-tuning or retrieval augmentation based on gaps identified in evals

Preferred Qualifications

  • Familiarity with clinical or healthcare data is a strong plus
  • Experience with reinforcement fine-tuning, model monitoring, or RLHF is a plus

Benefits

  • Competitive Salary & Equity Package
  • Health Insurance
  • Home Office Stipend
  • 401k
  • Paid Maternity/Paternity Leave (12 weeks)
  • Flexible/unlimited PTO

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.