Research Engineer - Evaluations

Canva
Summary
Join Canva and help redefine how the world experiences design by building the next-generation evaluation system for generative AI models. As a Research/Machine Learning Engineer, you will engineer sophisticated AI agents to assess the quality and human alignment of generative design models. This role focuses on building practical systems that provide rapid feedback, guiding the future of design generation and empowering millions of users. You will work on agentic evaluation systems, inference-time alignment, and model benchmarking and analysis. The position requires expertise in generative AI models, data-driven evaluation, large-scale model training, and machine learning with PyTorch. Canva offers a range of benefits, including equity packages, inclusive parental leave, a wellbeing allowance, and flexible leave options.
Requirements
- You have a strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures, with practical experience that informs robust evaluation strategies
- Excel at creating data-driven evaluation methodologies, turning user analytics into clear, actionable insights
- Youβve successfully managed or optimized large-scale distributed model training across hundreds of GPUs
- You have a solid understanding of machine learning, have worked with PyTorch and know how to optimize such codes for speed
- You have disciplined coding practices, and are experienced with code reviews and pull requests
- You have experience working in cloud environments, ideally AWS
Responsibilities
- Design, build, and optimize the infrastructure for an "MLLM-as-a-Judge" evaluation system for scalable, automated feedback
- Implement and experiment with inference-time alignment techniques (Prompt Engineering, RAG, ICL) to directly improve model output quality
- Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks
- Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team
- Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle
- Translate the latest research in LLM evaluation and agentic AI into practical, production-ready engineering solutions
Preferred Qualifications
- Familiarity with evaluation libraries and frameworks
- Experience building or working with agentic AI systems or multi-agent coordination
- Knowledge of data visualization tools to communicate findings effectively
- A background or interest in human-computer interaction, design principles, or AI ethics
Benefits
- Equity packages - we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, home office setup & more
- Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally