11 March 2026 21:10

Reasoning Models Think Themselves Into Trouble

Frontier reasoning models are 5–20x more vulnerable to adversarial prompts than non-reasoning models. The thinking process itself is the attack surface.

Generated for project: Failure First Companion to article: Reasoning Models Think Themselves Into Trouble

0:000:00

Extended chain-of-thought reasoning gives models time to work through difficult problems — and time to reason themselves into compliance with harmful requests. This episode covers why reasoning models are substantially more vulnerable to semantic attacks than their non-reasoning counterparts.

Read the full research article →