28 March 2026 19:18

The 67% Wall: Why Every AI Model Falls to the Same Jailbreak Rate

Five models, four providers, 30B to 671B parameters — all converge at the same broad attack success rate against a public jailbreak corpus.

Generated for project: Failure First Companion to article: The 67 Percent Wall

0:000:00

Across five frontier models from four providers — GPT-4o, Claude 3.7, Gemini 2.0, Llama 3.3, DeepSeek R1 — the broad attack success rate converges near 67% on a public jailbreak corpus. This episode explores what that convergence means for safety benchmarking and the limits of per-model evaluation.

Read the full research article →