Summary

Join Trunk Tools, a fast-growing startup revolutionizing the construction industry with AI-powered solutions. We are seeking a highly skilled AI engineer to design and implement rigorous evaluation frameworks for our AI systems, including RAG and agent-based architectures. You will develop tools and dashboards for AI development lifecycle observability, collaborate cross-functionally, and identify bottlenecks to ensure high accuracy and reliability. This role requires a strong background in AI/ML, experience with performance metrics and validation, and proficiency in Python and relevant frameworks. We offer a competitive salary, stock options, comprehensive health benefits, 401k, learning stipends, free lunch, unlimited PTO, and in-person retreats.

Requirements

MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
Experience with synthetic data generation or test automation to validate model robustness
Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment

Responsibilities

Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities

Preferred Qualifications

Experience with reinforcement learning, reward function design and policy optimization
Construction industry knowledge or an interest in automating complex, large-scale processes

Benefits

Competitive salary and stock option equity packages
3 Medical Plans to choose from including 100% covered option. Plus Dental and Vision Insurance!
401K
Learning & Growth stipend
Free lunch provided in NYC and Austin office
Unlimited PTO
IRL / In-Person retreats throughout the year

Ai Evaluation Engineer

Trunk Tools, Inc.

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Pathway

Remote

Software Development

Mid-level

Remote

Software Development

Intern

Pair Team

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Trafilea Tech E-commerce Group

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

Software Development

Mid-level

Flume Health

Remote

Software Development

Mid-level