Staff Software Engineer

Grafana Labs
Summary
Join Grafana Labs as a Senior Engineer in GenAI & ML Evaluation Frameworks and help build and evolve internal evaluation frameworks or integrate existing best-of-breed tools. Design and scale automated evaluation pipelines, integrating them into CI/CD workflows, and define metrics reflecting product goals and model behavior. This remote opportunity, open to applicants from USA time zones, offers a chance to expand or redefine the role based on impact and initiative. You will design and implement robust evaluation frameworks for GenAI and LLM-based systems, develop tooling for automated evaluation, define and refine metrics, and lead dataset management processes. Grafana Labs is a remote-first, open-source company with a global collaborative culture. The company offers competitive compensation and benefits.
Requirements
- Experience designing and implementing evaluation frameworks for AI/ML systems
- Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems
- High autonomy to collaborate and translate team goals into clear, testable criteria supported by effective tooling
Responsibilities
- Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification
- Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors
- Define and refine metrics for both structure and semantics, ensuring alignment with realistic use cases and operational constraints
- Lead the development of dataset management processes and guide teams across Grafana in best practices for GenAI evaluation
Preferred Qualifications
- Experience working in environments with rapid iteration and experimental development
- A pragmatic mindset that values reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems
- A passion for minimizing human toil and building AI systems that actively support engineers
Benefits
- Equity
- Bonus (if applicable)
- Restricted Stock Units (RSUs)
- 100% Remote, Global Culture
- Scaling Organization
- Transparent Communication
- Innovation-Driven
- Open Source Roots
- Empowered Teams
- Career Growth Pathways
- Approachable Leadership
- Passionate People
- In-Person onboarding
- Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable