Summary

Join Blue Rose Research, a leading progressive organization, as a 3-month fellow (with potential for full-time conversion) focusing on ensuring the quality and fairness of our Large Language Model (LLM)-powered tools. You will play a vital role in evaluating LLM performance, conducting quality control, proactively identifying biases, and improving LLM effectiveness through iterative refinement. This remote position requires strong analytical skills, meticulous attention to detail, and proficiency in SQL. We offer a competitive salary, medical, dental, and health benefits, and a supportive work environment. The work is primarily on East Coast time, with an office in NYC and regular meetups in DC. We encourage applications from diverse backgrounds and those with unconventional paths.

Requirements

Has experience in an analytical role involving data analysis, quality assurance, research, or a related field requiring meticulous attention to detail
Possesses strong proficiency in SQL for data querying, manipulation, and analysis, and a familiarity with or strong desire to learn basic Python scripting
Demonstrates exceptional attention to detail and a methodical approach, comfortable with tasks requiring careful checking and validation
Has strong analytical and problem-solving skills, with the ability to investigate discrepancies and interpret results
Is motivated by ensuring high standards of quality and accuracy, even when tasks involve repetitive review
Is interested in Large Language Models (LLMs) and the emerging field of prompt engineering (direct prior experience is a plus, but curiosity and willingness to learn are key)
Has strong oral and written communication skills, capable of clearly documenting findings and collaborating effectively in a remote environment
Is a kind person and a team player who contributes to a warm working environment
Thrives in multi-disciplinary teams and is eager to understand how their work impacts real-world decisions

Responsibilities

Own the evaluation lifecycle for our LLM applications : Design, implement, and manage evaluation frameworks to systematically measure performance, accuracy, and reliability across diverse tasks (e.g., video analysis, summarization, chatbot outputs)
Conduct rigorous quality control and analysis : Meticulously review LLM outputs, perform QC, analyze results using SQL, identify trends/weaknesses, and report findings clearly
Proactively enhance LLM safety and fairness : Execute red teaming analyses to uncover vulnerabilities and failure modes; analyze outputs for biases and contribute to mitigation efforts
Improve LLM effectiveness through iteration : Collaborate with the end users of our LLM products to understand their needs and refine prompts to enhance output quality, safety, and utility
Document and communicate findings : Maintain clear records of processes and results, effectively communicating insights, including potentially sensitive ones, to stakeholders

Preferred Qualifications

May have past experience working with a progressive campaign or organization and is willing to engage with the wider progressive political ecosystem and develop domain knowledge alongside technical skills

Benefits

Medical, dental, and health benefits
Competitive salary for the fellowship period

Analyst - LLM/Prompt Evaluation

Blue Rose Research

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Intern

Share this job:

Similar Remote Jobs

Dexis Consulting Group

Remote

All Others

Mid-level

Dexis Consulting Group

Remote

All Others

Mid-level

Natera

Remote

Finance & Legal

Mid-level

Grailed

Remote

Data

Mid-level

Remote

Finance & Legal

Mid-level

Remote

Business

Mid-level

Remote

Software Development

Manager

Remote

Software Development

Manager

Remote

Cybersecurity

Senior

Remote

Data

Senior