Software Engineer, Safeguards Evals

Anthropic

San Francisco, CA | New York City, NY8d ago

Apply on job-boards.greenhouse.io View on map

Looking for more like this? See all Software Engineer jobs.

About the role

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role How do we know our safety systems actually catch misuse? Anthropic increasingly uses AI to investigate potential misuse of Claude — analyzing real-world traffic to surface bad actors, policy violations, and emerging threats. Its findings inform enforcement actions and model launch decisions, which means we need rigorous, trustworthy answers to questions like: Does the monitoring agent catch what it should? Where does it fail? Does it stay reliable as adversaries a

Software Engineer, Safeguards Evals

About the role

More at Anthropic