Summary

Join Arize's growing open-source team as an Open Source AI Engineer to build and improve frameworks, metrics, and tooling for Large Language Models (LLMs). You will design and open-source new libraries and APIs for evaluating LLM output, define benchmarks and metrics for optimizing AI tasks, collaborate with the AI open-source community, prototype and iterate on LLM techniques, integrate with Arize's platform to improve observability and debugging, and educate developers through various content. This role requires hands-on LLM experience, strong programming skills in Python (TypeScript is a plus), and knowledge of NLP evaluation methods. Arize offers a fully remote, flexible work environment, competitive salary and equity, and a comprehensive benefits package.

Requirements

Hands-on LLM Experience: Familiarity with popular LLM frameworks, prompt engineering techniques, and model fine-tuning
Strong Programming Skills: Fluent in Python for AI workflows
Evaluation Knowledge: Understanding of core NLP evaluation methods and experience applying or extending them for LLM systems
Open Source Track Record: Contributions to open source projects, personal GitHub repos with interesting AI demos, or a history of active engagement in developer communities

Responsibilities

Build LLM Eval Frameworks: Design, architect, and open-source new libraries, pipelines, and APIs that make it simpler to evaluate LLM output quality, consistency, and reliability at scale
Define Metrics and Benchmarks: Curate golden datasets and develop robust benchmarked metrics that guide data scientists and AI practitioners in optimizing their AI tasks
Collaborate with the Community: Partner closely with the broader AI open source ecosystem, gather feedback, review pull requests, and steer the direction of the project to address real developer needs
Prototype and Iterate Rapidly: Experiment with state-of-the-art LLM techniques, turning research into practical developer tooling
Improve Observability and Debugging: Integrate with our existing platform to surface deeper insights on LLM behavior—help teams quickly diagnose and fix issues such as hallucinations or bias
Educate and Evangelize: Write blog posts, white papers, tutorials, and documentation to help developers succeed with our open source tools and grow the LLM eval community

Preferred Qualifications

Strong Programming Skills: bonus if you can navigate TypeScript as well
ML Observability & Tools: Familiarity with debugging AI applications, exploring embeddings, or building data-heavy dashboards is a plus

Benefits

Fully Remote, Flexible Environment: We are a fully remote company with offices in the Bay Area and NYC for those who prefer in-person collaboration
Cutting-Edge Challenges: Our platform already helps analyze millions of AI predictions daily, giving you the chance to refine your evaluation tooling on real, large-scale production workloads
Work With a Talented, Passionate Team: Collaborate closely with top engineers who are dedicated to making AI more transparent, reliable, and impactful
Medical, dental, vision
401(k) plan
Unlimited paid time off
Generous parental leave plan
Others for mental and wellness support
For all other employees, there is a WFH monthly stipend to pay for co-working spaces

Open Source Ai Engineer

Arize AI

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Canonical

Remote

Software Development

Mid-level

Remote

DevOps

Mid-level

Remote

Software Development

Senior

Canonical

Remote

Software Development

Mid-level

Canonical

Remote

Software Development

Senior

Pathway

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

Software Development

Intern