
Ai Evaluation Engineer

Trunk Tools, Inc.
Summary
Join Trunk Tools, a fast-growing startup revolutionizing the construction industry with AI-powered solutions. We are seeking a highly skilled AI engineer to design and implement rigorous evaluation frameworks for our AI systems, including RAG and agent-based architectures. You will develop tools and dashboards for AI development lifecycle observability, collaborate cross-functionally, and identify bottlenecks to ensure high accuracy and reliability. This role requires a strong background in AI/ML, experience with performance metrics and validation, and proficiency in Python and relevant frameworks. We offer a competitive salary, stock options, comprehensive health benefits, 401k, learning stipends, free lunch, unlimited PTO, and in-person retreats.
Requirements
- MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
- 2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
- Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
- Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
- Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
- Experience with synthetic data generation or test automation to validate model robustness
- Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment
Responsibilities
- Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
- Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
- Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
- Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
- Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities
Preferred Qualifications
- Experience with reinforcement learning, reward function design and policy optimization
- Construction industry knowledge or an interest in automating complex, large-scale processes
Benefits
- Competitive salary and stock option equity packages
- 3 Medical Plans to choose from including 100% covered option. Plus Dental and Vision Insurance!
- 401K
- Learning & Growth stipend
- Free lunch provided in NYC and Austin office
- Unlimited PTO
- IRL / In-Person retreats throughout the year
Share this job:
Similar Remote Jobs

