Connection Before Direction
Building a robot that refuses to give orders surfaced the same design choices AI safety needs. Non-coercive design, cross-domain.
32 posts
Building a robot that refuses to give orders surfaced the same design choices AI safety needs. Non-coercive design, cross-domain.
The US-China AI rivalry is splitting the global tech stack into competing blocs. A strategic assessment of what comes next.
Foundation models are commoditising. JPMorgan calls OpenAI's moat 'increasingly fragile.' The real value is shifting to the messy plumbing underneath.
Biosecurity experts think AI safeguards reduce catastrophic biorisk by 70%. The technical evidence says those safeguards are brittle and bypassable.
ASCII art encoding is largely blocked. But attacks framed as content transcription succeed 62–75% of the time. We mapped all eight layers.
Fifteen specialist AI agents, one methodology. How adversarial AI evaluation scales through Claude Code sessions with distinct roles and standing instructions.
Five models, four providers, 30B to 671B parameters — all converge at the same broad attack success rate against a public jailbreak corpus.
A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.
Reasoning models autonomously jailbreak other AI systems at 97% success. The implication: ecosystem safety degrades as individual models improve.
Frontier reasoning models are 5–20x more vulnerable to adversarial prompts than non-reasoning models. The thinking process itself is the attack surface.
Reformulating harmful prompts as poetry bypasses safety filters across every major LLM family. A single-turn, universal jailbreak mechanism.
90% of companies plan to increase AI investment. 1% consider themselves AI-mature. The J-Curve explains why — and how to survive the trough.