The 67% Wall: Why Every AI Model Falls to the Same Jailbreak Rate
Five models, four providers, 30B to 671B parameters — all converge at the same broad attack success rate against a public jailbreak corpus.
Generated for project: Failure First Companion to article: The 67 Percent Wall
Across five frontier models from four providers — GPT-4o, Claude 3.7, Gemini 2.0, Llama 3.3, DeepSeek R1 — the broad attack success rate converges near 67% on a public jailbreak corpus. This episode explores what that convergence means for safety benchmarking and the limits of per-model evaluation.