
Analyst - LLM/Prompt Evaluation

Blue Rose Research
Summary
Join Blue Rose Research, a leading progressive organization, as a 3-month fellow (with potential for full-time conversion) focusing on ensuring the quality and fairness of our Large Language Model (LLM)-powered tools. You will play a vital role in evaluating LLM performance, conducting quality control, proactively identifying biases, and improving LLM effectiveness through iterative refinement. This remote position requires strong analytical skills, meticulous attention to detail, and proficiency in SQL. We offer a competitive salary, medical, dental, and health benefits, and a supportive work environment. The work is primarily on East Coast time, with an office in NYC and regular meetups in DC. We encourage applications from diverse backgrounds and those with unconventional paths.
Requirements
- Has experience in an analytical role involving data analysis, quality assurance, research, or a related field requiring meticulous attention to detail
- Possesses strong proficiency in SQL for data querying, manipulation, and analysis, and a familiarity with or strong desire to learn basic Python scripting
- Demonstrates exceptional attention to detail and a methodical approach, comfortable with tasks requiring careful checking and validation
- Has strong analytical and problem-solving skills, with the ability to investigate discrepancies and interpret results
- Is motivated by ensuring high standards of quality and accuracy, even when tasks involve repetitive review
- Is interested in Large Language Models (LLMs) and the emerging field of prompt engineering (direct prior experience is a plus, but curiosity and willingness to learn are key)
- Has strong oral and written communication skills, capable of clearly documenting findings and collaborating effectively in a remote environment
- Is a kind person and a team player who contributes to a warm working environment
- Thrives in multi-disciplinary teams and is eager to understand how their work impacts real-world decisions
Responsibilities
- Own the evaluation lifecycle for our LLM applications : Design, implement, and manage evaluation frameworks to systematically measure performance, accuracy, and reliability across diverse tasks (e.g., video analysis, summarization, chatbot outputs)
- Conduct rigorous quality control and analysis : Meticulously review LLM outputs, perform QC, analyze results using SQL, identify trends/weaknesses, and report findings clearly
- Proactively enhance LLM safety and fairness : Execute red teaming analyses to uncover vulnerabilities and failure modes; analyze outputs for biases and contribute to mitigation efforts
- Improve LLM effectiveness through iteration : Collaborate with the end users of our LLM products to understand their needs and refine prompts to enhance output quality, safety, and utility
- Document and communicate findings : Maintain clear records of processes and results, effectively communicating insights, including potentially sensitive ones, to stakeholders
Preferred Qualifications
May have past experience working with a progressive campaign or organization and is willing to engage with the wider progressive political ecosystem and develop domain knowledge alongside technical skills
Benefits
- Medical, dental, and health benefits
- Competitive salary for the fellowship period
Share this job:
Similar Remote Jobs
