Ai Evaluation Engineer

Trunk Tools, Inc. Logo

Trunk Tools, Inc.

πŸ“Remote - United States

Summary

Join Trunk Tools, a fast-growing startup revolutionizing the construction industry with AI-powered solutions. We are seeking a highly skilled AI engineer to design and implement rigorous evaluation frameworks for our AI systems, including RAG and agent-based architectures. You will develop tools and dashboards for AI development lifecycle observability, collaborate cross-functionally, and identify bottlenecks to ensure high accuracy and reliability. This role requires a strong background in AI/ML, experience with performance metrics and validation, and proficiency in Python and relevant frameworks. We offer a competitive salary, stock options, comprehensive health benefits, 401k, learning stipends, free lunch, unlimited PTO, and in-person retreats.

Requirements

  • MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
  • 2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
  • Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
  • Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
  • Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
  • Experience with synthetic data generation or test automation to validate model robustness
  • Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment

Responsibilities

  • Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
  • Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
  • Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
  • Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
  • Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities

Preferred Qualifications

  • Experience with reinforcement learning, reward function design and policy optimization
  • Construction industry knowledge or an interest in automating complex, large-scale processes

Benefits

  • Competitive salary and stock option equity packages
  • 3 Medical Plans to choose from including 100% covered option. Plus Dental and Vision Insurance!
  • 401K
  • Learning & Growth stipend
  • Free lunch provided in NYC and Austin office
  • Unlimited PTO
  • IRL / In-Person retreats throughout the year

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs