Software Engineer, Safeguards Evals
Anthropic
San Francisco, CA | New York City, NY8d ago
Looking for more like this? See all Software Engineer jobs.
About the role
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the role
How do we know our safety systems actually catch misuse? Anthropic increasingly uses AI to investigate potential misuse of Claude — analyzing real-world traffic to surface bad actors, policy violations, and emerging threats. Its findings inform enforcement actions and model launch decisions, which means we need rigorous, trustworthy answers to questions like: Does the monitoring agent catch what it should? Where does it fail? Does it stay reliable as adversaries a
More at Anthropic
- Partner Enablement Lead, System IntegratorsSan Francisco, CA | New York City, NY
- Applied AI Architect, PartnershipsSan Francisco, CA | New York City, NY | Seattle, WA
- Staff Software Engineer, AI ReliabilitySan Francisco, CA | New York City, NY | Seattle, WA
- Senior Manager, Compute Infrastructure Procurement Operations - Lease Administr…San Francisco, CA | Seattle, WA
- Director, Revenue AccountingSan Francisco, CA | Seattle, WA
- Manager– Growth Sales (AI-Native)San Francisco, CA