Lead Machine Learning Engineer - Applied Scientist

Upwork
Summary
Join Upwork, the worldโs largest work marketplace, as a Lead Machine Learning Engineer/Applied Scientist. You will rigorously evaluate and improve the performance of LLMs and AI agents, focusing on building feedback loops, defining success metrics, and driving measurable improvements. Partnering with research, engineering, and product teams, you will embed insights into Upworkโs AI infrastructure, designing testbeds, refining prompts, and guiding the iteration of ML-powered features. This role offers a unique opportunity to shape the evaluation and iteration processes for AI at scale, directly influencing the success of Upwork's most advanced AI initiatives. You will work in a remote-first environment with a company culture built on trust, risk-taking, customer focus, and excellence. Upwork offers comprehensive benefits, including medical coverage, unlimited PTO, a 401(k) plan with matching, paid parental leave, and an Employee Stock Purchase Plan.
Requirements
- Deep familiarity with evaluation methodologies for LLMs or autonomous agents, including benchmark selection, prompt sensitivity analysis, and human-in-the-loop review processes
- Hands-on experience in Python and ML frameworks such as PyTorch, with the ability to analyze outputs and drive performance iteration
- Proven ability to work across teams and disciplines to align ML evaluation work with product and business goals
- Comfortable operating in ambiguous problem spaces and taking initiative to define structure, priorities, and impact
- Passion for continuous improvement, strong documentation habits, and a collaborative, inclusive working style
Responsibilities
- Develop and own evaluation pipelines for agentic LLM systems, enabling consistent measurement across simulation, benchmark, and live-user scenarios
- Define and iterate on quality metrics that guide the training, tuning, and deployment of LLMs and agents
- Lead experiments to assess and improve system behaviors across dimensions such as correctness, safety, latency, and helpfulness
- Collaborate with cross-functional partners to integrate insights from evaluation into product development and deployment pipelines
- Build automated testing and monitoring tools that scale with the complexity of agent behaviors and LLM responses
- Share findings and improvements through documentation, dashboards, and internal demos, contributing to a culture of continuous learning and excellence
Benefits
- Comprehensive medical coverage for you and your family
- Unlimited PTO
- A 401(k) plan with matching
- 12 weeks of paid parental leave
- An Employee Stock Purchase Plan