The Organismic Line: Where Predictive Processing Stops Being a Metaphor
Predictive processing travels into AI. Active inference does not, unless the system can pay for being wrong.
16 posts
Predictive processing travels into AI. Active inference does not, unless the system can pay for being wrong.
AI safety fails when it is funded like a pilot. Until safety has a real price, the J-curve trough is also a safety trough.
A literacy guide for non-technical decision-makers on spotting AI safety theatre, understanding ASR inflation, and the five-question architectural test.
Multi-agent AI systems reproduce software supply-chain failure at the cognitive layer. The security playbook transfers.
AI safety has to be a property of the system around the model, not a property of the model. The general principle, and why every safety conversation needs it.
Human prediction is metabolic. AI prediction is not. The gap between the two has consequences for both clinical practice and AI safety vocabulary.
Building a robot that refuses to give orders surfaced the same design choices AI safety needs. Non-coercive design, cross-domain.
Biosecurity experts think AI safeguards reduce catastrophic biorisk by 70%. The technical evidence says those safeguards are brittle and bypassable.
Fifteen specialist AI agents, one methodology. How adversarial AI evaluation scales through Claude Code sessions with distinct roles and standing instructions.
Building AI for trauma therapy means the safety architecture has to exist before a single therapeutic feature does. Here's why.
Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.
Why do large organisations fail when the warning signs are loud and unambiguous? Four mechanisms of structural scar tissue that make truth-telling expensive.
120 models, 18k prompts: supply chain injection at 90–100% attack success, faithfulness gaps in frontier models, and why your benchmark numbers are wrong.
A probabilistic risk model for VLA-driven humanoid fatalities projects a 'Danger Zone' between 2027–2029: the mechanism, timeline, and what follows.
64 jailbreak scenarios across six eras tested on 2026 frontier models. Key finding: 2022 attacks still achieve ~30% success on today's reasoning models.
Single-agent safety does not compose in multi-agent systems. 1.5M interactions show 46.34% attack success rates and 16-minute median failure windows.