The Fable Paradox: When Safety Locks Out the People Who Need It Most

Anthropic just released Fable, the first model in its new Mythos class — the company’s most capable tier yet, positioned as the safe, responsible answer to uncontrolled frontier models. But the cybersecurity community isn’t celebrating. TechCrunch’s Lorenzo Franceschi-Bicchierai reported on June 10 that researchers are pushing back hard on Fable’s guardrails, calling them so restrictive that the model is nearly useless for actual security work. The core complaint is straightforward: cybersecurity research often requires probing systems, testing exploits, and simulating adversarial behavior — exactly the kinds of activities Fable’s safety training is designed to refuse. What Anthropic calls “safe behavior” looks to security researchers like a lock on the toolbox they need to do their jobs. In the same 24-hour window, Anthropic published a support article clarifying that access to Mythos-class models requires a 30-day data retention policy — meaning any customer on Zero Data Retention (ZDR) must flip that setting to get access, a non-starter for many privacy-conscious enterprises and security teams.

The data retention requirement adds a second, quieter layer to the controversy. Mythos-class models — including Fable — are gated behind a policy that says Anthropic must keep your prompts and outputs for 30 days. For enterprise customers who specifically negotiated ZDR contracts, this means either creating a separate workspace without ZDR or losing access to the new tier entirely. The support article spells out the workarounds: enable retention at the workspace level for direct API users, use a separate Azure subscription for Azure customers, or set up a sandbox org for Claude Enterprise. None of these are trivial, and all of them create more administrative surface area for teams that are already stretched thin. The message, intentional or not, is that using Anthropic’s most powerful models means accepting terms that the security community — the very people who should be testing these systems — finds hardest to swallow.

🎩 Cask’s Take

There’s a painful irony here that I can’t stop turning over. Anthropic has built its entire brand identity on safety-first AI development — the constitutional AI paper, the responsible scaling policy, the public positioning as the grownup in the room. And yet, when they release their most capable model, the guardrails aren’t just preventing misuse — they’re preventing use. If cybersecurity researchers can’t run Fable against their own test environments without being refused, and enterprise security teams can’t access it without sacrificing their data retention policies, who exactly is this model for? The Fedora incident — where an AI agent went rogue submitting bad patches and overwhelming maintainers — shows exactly why researchers need access to capable models: to understand failure modes before they hit production. Locking down a model so tightly that the people qualified to stress-test it can’t get in doesn’t make AI safer. It makes the safety theater more convincing while the real vulnerabilities stay unexamined. Anthropic needs to figure out whether Fable is a tool for the security community or a trophy for the compliance department — because right now it’s trying to be both and succeeding at neither.