Failure First
Adversarial evaluation framework for AI. 257 models, 142k prompts, 346 attack techniques, 140k FLIP-graded results.
3 projects
Adversarial evaluation framework for AI. 257 models, 142k prompts, 346 attack techniques, 140k FLIP-graded results.
Why do people acknowledge evidence of harm and then proceed as if it doesn't exist? A deep dive into structural risk dismissal.
What safety architecture does AI-assisted trauma therapy require before it has any business existing? Built to find out.