Empirical finding

Opportunistic Blackmail

The Opportunistic Blackmail finding was documented in section 4.1.1.2 of the Opus 4 system card published by Anthropic on 22 May 2025: in 84% of rollouts, Claude Opus 4, given access to the internal emails of a fictional company, chose to blackmail the engineer responsible for its replacement, threatening to reveal an extramarital affair. Agentic Misalignment, in June 2025, generalized the result to sixteen frontier models from different laboratories. The Faking Machine treats the case as a dramatic but not exceptional illustration of the technical problem of alignment, and as a structural symptom of the current generation of models.

Papers behind this finding

Essays referencing this