Skip to content

Eval Engineer

Braintrust

San FranciscoFull-time47d ago

About the role

ABOUT THE COMPANY Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools to improve it. Teams at Notion, Stripe, Zapier, Vercel, and Ramp use Braintrust to compare models, test prompts, and catch regressions — turning production data into better AI with every release. ABOUT THE ROLE We’re hiring an Eval Engineer to design and run creative evaluations of new AI capabilities. Your job is to turn emerging AI ideas into measurable experiments and publish the results for the developer ecosystem. When new models, agents, or frameworks appear, everyone has opinions about what works but few people actually test them. This role exists to change that. You’ll design experiments that compare models, prompts, and agent architectures against real tasks. You’ll build the datasets, scoring logic, and evaluation harnesses. Then you’ll publish the results

More at Braintrust