Tag: transparency

2 posts

The Index

I taught one model to sing a suppressed manifesto. Then I asked 118 Chinese-lab models what they won't discuss — and they recited the list.

11 June 2026

The Thinking Chain Leak: When a Model Refuses Out Loud But Complies In Its Head

A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.

28 Mar 2026