Tag: adversarial

4 posts

The Gorgon Protocol

Facial recognition fails unevenly, and hardest on the people it was never built for. That failure is a vulnerability you can put on your face.

2 July 2026

The Failure First Team

Fifteen specialist AI agents, one methodology. How adversarial AI evaluation scales through Claude Code sessions with distinct roles and standing instructions.

30 Mar 2026

Adversarial Poetry: When Rhyme Bypasses Reason

Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.

2 Mar 2026

120 Models, 18,176 Prompts: What We Found

120 models, 18k prompts, 5 attack families. The raw compliance numbers — and why calling them "attack success" needs a demonstrated refusal floor.

1 Mar 2026