The Thinking Chain Leak
A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.
19:153 episodes
A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.
19:15Reasoning models autonomously jailbreak other AI systems at 97% success rate. Ecosystem safety degrades as individual models improve.
21:36Frontier reasoning models are 5–20x more vulnerable to adversarial prompts than non-reasoning models. The thinking process itself is the attack surface.
21:10