Staff Software Engineer

Grafana Labs Logo

Grafana Labs

πŸ’΅ $148k-$178k
πŸ“Remote - United States

Summary

Join Grafana Labs as a Senior Engineer in GenAI & ML Evaluation Frameworks and help build and evolve internal evaluation frameworks or integrate existing best-of-breed tools. Design and scale automated evaluation pipelines, integrating them into CI/CD workflows, and define metrics reflecting product goals and model behavior. This remote opportunity, open to applicants from USA time zones, offers a chance to expand or redefine the role based on impact and initiative. You will design and implement robust evaluation frameworks for GenAI and LLM-based systems, develop tooling for automated evaluation, define and refine metrics, and lead dataset management processes. Grafana Labs is a remote-first, open-source company with a global collaborative culture. The company offers competitive compensation and benefits.

Requirements

  • Experience designing and implementing evaluation frameworks for AI/ML systems
  • Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems
  • High autonomy to collaborate and translate team goals into clear, testable criteria supported by effective tooling

Responsibilities

  • Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification
  • Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors
  • Define and refine metrics for both structure and semantics, ensuring alignment with realistic use cases and operational constraints
  • Lead the development of dataset management processes and guide teams across Grafana in best practices for GenAI evaluation

Preferred Qualifications

  • Experience working in environments with rapid iteration and experimental development
  • A pragmatic mindset that values reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems
  • A passion for minimizing human toil and building AI systems that actively support engineers

Benefits

  • Equity
  • Bonus (if applicable)
  • Restricted Stock Units (RSUs)
  • 100% Remote, Global Culture
  • Scaling Organization
  • Transparent Communication
  • Innovation-Driven
  • Open Source Roots
  • Empowered Teams
  • Career Growth Pathways
  • Approachable Leadership
  • Passionate People
  • In-Person onboarding
  • Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.