Summary
Join Grafana as a Senior Engineer in GenAI & ML Evaluation Frameworks and contribute to building and improving internal evaluation frameworks for Generative AI systems, particularly Large Language Models (LLMs). This remote position, open to Canadian applicants only, involves designing and scaling automated evaluation pipelines, integrating them into CI/CD workflows, and defining relevant metrics. You will tackle challenges such as designing robust evaluation frameworks, developing automated evaluation tooling, defining and refining evaluation metrics, and leading dataset management processes. The role offers opportunities for growth and impact as the team expands. Compensation is competitive, and benefits include equity and bonuses.
Requirements
- Experience designing and implementing evaluation frameworks for AI/ML systems
- Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems
- High autonomy to collaborate and translate team goals into clear, testable criteria supported by effective tooling
Responsibilities
- Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification
- Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors
- Define and refine metrics for both structure and semantics, ensuring alignment with realistic use cases and operational constraints
- Lead the development of dataset management processes and guide teams across Grafana in best practices for GenAI evaluation
Preferred Qualifications
- Experience working in environments with rapid iteration and experimental development
- A pragmatic mindset that values reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems
- A passion for minimizing human toil and building AI systems that actively support engineers
Benefits
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.