Tag: alignment

2 posts

Moral Formation Isn't Enough

Good values are necessary but not sufficient. What happens to AI ethics when someone is actively trying to break them?

Reasoning models autonomously jailbreak other AI systems at 97% success. The implication: ecosystem safety degrades as individual models improve.