Research Engineer - Evaluations at Canva

Summary

Join Canva and help redefine how the world experiences design by building the next-generation evaluation system for generative AI models. As a Research/Machine Learning Engineer, you will engineer sophisticated AI agents to assess the quality and human alignment of generative design models. This role focuses on building practical systems that provide rapid feedback, guiding the future of design generation and empowering millions of users. You will work on agentic evaluation systems, inference-time alignment, and model benchmarking and analysis. The position requires expertise in generative AI models, data-driven evaluation, large-scale model training, and machine learning with PyTorch. Canva offers a range of benefits, including equity packages, inclusive parental leave, a wellbeing allowance, and flexible leave options.

Requirements

You have a strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures, with practical experience that informs robust evaluation strategies
Excel at creating data-driven evaluation methodologies, turning user analytics into clear, actionable insights
You’ve successfully managed or optimized large-scale distributed model training across hundreds of GPUs
You have a solid understanding of machine learning, have worked with PyTorch and know how to optimize such codes for speed
You have disciplined coding practices, and are experienced with code reviews and pull requests
You have experience working in cloud environments, ideally AWS

Responsibilities

Design, build, and optimize the infrastructure for an "MLLM-as-a-Judge" evaluation system for scalable, automated feedback
Implement and experiment with inference-time alignment techniques (Prompt Engineering, RAG, ICL) to directly improve model output quality
Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks
Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team
Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle
Translate the latest research in LLM evaluation and agentic AI into practical, production-ready engineering solutions

Preferred Qualifications

Familiarity with evaluation libraries and frameworks
Experience building or working with agentic AI systems or multi-agent coordination
Knowledge of data visualization tools to communicate findings effectively
A background or interest in human-computer interaction, design principles, or AI ethics

Benefits

Equity packages - we want our success to be yours too
Inclusive parental leave policy that supports all parents & carers
An annual Vibe & Thrive allowance to support your wellbeing, social connection, home office setup & more
Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Research Engineer - Evaluations

Canva

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

All Others

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level