Open Source Ai Engineer

Arize AI Logo

Arize AI

๐Ÿ’ต $150k-$185k
๐Ÿ“Remote - Worldwide

Summary

Join Arize's growing open-source team as an Open Source AI Engineer to build and improve frameworks, metrics, and tooling for Large Language Models (LLMs). You will design and open-source new libraries and APIs for evaluating LLM output, define benchmarks and metrics for optimizing AI tasks, collaborate with the AI open-source community, prototype and iterate on LLM techniques, integrate with Arize's platform to improve observability and debugging, and educate developers through various content. This role requires hands-on LLM experience, strong programming skills in Python (TypeScript is a plus), and knowledge of NLP evaluation methods. Arize offers a fully remote, flexible work environment, competitive salary and equity, and a comprehensive benefits package.

Requirements

  • Hands-on LLM Experience: Familiarity with popular LLM frameworks, prompt engineering techniques, and model fine-tuning
  • Strong Programming Skills: Fluent in Python for AI workflows
  • Evaluation Knowledge: Understanding of core NLP evaluation methods and experience applying or extending them for LLM systems
  • Open Source Track Record: Contributions to open source projects, personal GitHub repos with interesting AI demos, or a history of active engagement in developer communities

Responsibilities

  • Build LLM Eval Frameworks: Design, architect, and open-source new libraries, pipelines, and APIs that make it simpler to evaluate LLM output quality, consistency, and reliability at scale
  • Define Metrics and Benchmarks: Curate golden datasets and develop robust benchmarked metrics that guide data scientists and AI practitioners in optimizing their AI tasks
  • Collaborate with the Community: Partner closely with the broader AI open source ecosystem, gather feedback, review pull requests, and steer the direction of the project to address real developer needs
  • Prototype and Iterate Rapidly: Experiment with state-of-the-art LLM techniques, turning research into practical developer tooling
  • Improve Observability and Debugging: Integrate with our existing platform to surface deeper insights on LLM behaviorโ€”help teams quickly diagnose and fix issues such as hallucinations or bias
  • Educate and Evangelize: Write blog posts, white papers, tutorials, and documentation to help developers succeed with our open source tools and grow the LLM eval community

Preferred Qualifications

  • Strong Programming Skills: bonus if you can navigate TypeScript as well
  • ML Observability & Tools: Familiarity with debugging AI applications, exploring embeddings, or building data-heavy dashboards is a plus

Benefits

  • Fully Remote, Flexible Environment: We are a fully remote company with offices in the Bay Area and NYC for those who prefer in-person collaboration
  • Cutting-Edge Challenges: Our platform already helps analyze millions of AI predictions daily, giving you the chance to refine your evaluation tooling on real, large-scale production workloads
  • Work With a Talented, Passionate Team: Collaborate closely with top engineers who are dedicated to making AI more transparent, reliable, and impactful
  • Medical, dental, vision
  • 401(k) plan
  • Unlimited paid time off
  • Generous parental leave plan
  • Others for mental and wellness support
  • For all other employees, there is a WFH monthly stipend to pay for co-working spaces

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.