AI Engineer, Agents & Evaluation

Guild

San FranciscoFull-time131d ago

Looking for more like this? See all Machine Learning Engineer jobs.

About the role

We’re looking for our first AI Engineer focused on agents and evaluation—a foundational hire who will shape how we build, measure, and scale intelligent systems. THE OPPORTUNITY: DESIGN THE PLAYBOOK FOR HIGH-PERFORMANCE AI AGENTS We’re tackling one of the hardest—and most important—problems in software engineering: helping developers understand, evolve, and operate complex systems using autonomous and event-driven AI. In this role, you’ll build the evaluation frameworks, task harnesses, and orchestration strategies that make our agents reliable, testable, and genuinely useful. Your work will not only directly improve our agents—it will create reusable benchmarks and artifacts that can inspire new approaches and push forward the broader foundation model ecosystem. If you love designing experiments, building systems, and iterating tightly between theory and code—and you’re excited by a very 0→1, research-engineering style role—this is for you. WHAT YOU WILL DO - Create Task Eva

AI Engineer, Agents & Evaluation

About the role

More at Guild