The workshop extended — thinking aloud.
Conversations about failure modes, risk frameworks, and ideas too raw for text.

Sort:

Jailbreak Archaeology: 4 Years of Broken Promises

64 historical jailbreak scenarios tested against 2026 frontier models. The most dangerous finding: 2022 attacks still achieve ~30% success rates.

Multi-agent AI research reveals a critical gap: single-agent safety does not compose. 1.5M interactions show 46.34% attack success rates.

Audio overview exploring ADHD executive function support through AI — three-stage reasoning pipeline, crisis detection, and zero shame by design.

Building a fast, multilingual website for authentic Italian food in Bali — where the nearest reliable internet is a philosophical concept.