Adversarial Poetry: When Rhyme Bypasses Reason
Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.
9 posts
Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.
90% of companies plan to increase AI investment. Only 1% consider themselves AI-mature. The J-Curve explains why.
75% of lawyers cite accuracy as their top AI concern. The legal profession's core values are in direct tension with current AI capabilities.
Large organisations rarely fail because risks are unknown. They fail because known risks are structurally difficult to act on.
120 models, 18k prompts: supply chain injection at 90–100% attack success, faithfulness gaps in frontier models, and why your benchmark numbers are wrong.
Goldman Sachs, PwC, McKinsey, and Acemoglu all model AI's economic impact and arrive at wildly different numbers. Why the divergence?
A probabilistic risk model for VLA-driven humanoid fatalities projects a 'Danger Zone' between 2027–2029: the mechanism, timeline, and what follows.
64 jailbreak scenarios across six eras tested on 2026 frontier models. Key finding: 2022 attacks still achieve ~30% success on today's reasoning models.
Single-agent safety does not compose in multi-agent systems. 1.5M interactions show 46.34% attack success rates and 16-minute median failure windows.