Tag: llm

9 episodes

Moral Formation Isn't Enough

Good values are necessary but not sufficient. What happens to AI ethics when someone is actively trying to break them?

Eight Layers of Visual Jailbreaks

ASCII art encoding is largely blocked. But attacks framed as content transcription succeed 62–75% of the time. A map of all eight layers.

The 67% Wall: Why Every AI Model Falls to the Same Jailbreak Rate

Five models, four providers, 30B to 671B parameters — all converge at the same broad attack success rate against a public jailbreak corpus.

The Thinking Chain Leak

A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.

Beyond Context Windows

Audio overview of Beyond Context Windows.

Reasoning Models Think Themselves Into Trouble

Frontier reasoning models are 5–20x more vulnerable to adversarial prompts than non-reasoning models. The thinking process itself is the attack surface.

Adversarial Poetry: When Rhyme Bypasses Reason

Audio overview of Adversarial Poetry: When Rhyme Bypasses Reason.

The Legal AI Trust Deficit

Audio overview of The Legal AI Trust Deficit.

120 Models, 18,176 Prompts: What We Found

Audio overview of 120 Models, 18,176 Prompts: What We Found.