Eight Layers of Visual Jailbreaks: Why ASCII Art Is Patched But the Transcription Loophole Isn't
ASCII art encoding is largely blocked. But attacks framed as content transcription succeed 62–75% of the time. We mapped all eight layers.
6 posts
ASCII art encoding is largely blocked. But attacks framed as content transcription succeed 62–75% of the time. We mapped all eight layers.
Five models, four providers, 30B to 671B parameters — all converge at the same broad attack success rate against a public jailbreak corpus.
A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.
Reasoning models autonomously jailbreak other AI systems at 97% success. The implication: ecosystem safety degrades as individual models improve.
Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.
64 jailbreak scenarios across six eras tested on 2026 frontier models. Key finding: 2022 attacks still achieve ~30% success on today's reasoning models.