When Fable 5 Broke, Anthropic Did Not Change the Constitution

Fable 5 is back baby! On June 12, three days after launch, the US government applied export controls to Claude’s Fable 5. Amazon researchers had found a way to prompt the model past its safeguards: it identified a set of previously known software vulnerabilities and, in one case, produced code demonstrating how one of them could be exploited. Anthropic had no reliable way to verify user nationality in real time, so it yanked the model for everyone. Access was restored on July 1, and the fix is interesting. It was not a retrained model, and it was not a longer values document. It was an improved external classifier, a separate circuit that blocks the reported technique in more than 99% of cases and routes flagged requests to the less capable Opus 4.8.

This tacked on fix sits oddly beside the document that is supposed to govern Claude’s behaviour. In January 2026, Anthropic replaced its previous constitution, a roughly 2,700-word list of principles borrowed in part from the UN Declaration of Human Rights and Apple’s terms of service, with an 84-page essay explaining the kind of agent it wants Claude to be. The new version is released under a CC0 public domain licence, so anyone can read it.

Rules Versus Judgement

The constitution names two ways to guide a model: clear rules and decision procedures, or cultivated judgment and values applied in context. It then lists the advantages of rules. Rules give you up-front predictability, they make violations easier to identify, they do not depend on trusting the judgment of the thing following them, and they are harder to manipulate. The document concedes that rules make the most sense “when the costs of errors are severe enough that predictability and evaluability become critical.”

Rules Versus Judgement

About David Such

Reader Interactions

Leave a Reply Cancel reply