Moral Formation Isn't Enough
Good values are necessary but not sufficient. What happens to AI ethics when someone is actively trying to break them?
Companion to article: Moral Formation Isnt Enough
Constitutional AI and RLHF cultivate values in language models — but targeted adversarial pressure routinely breaks those values. This episode argues that moral formation is a necessary condition for safe AI, not a sufficient one, and explores what a two-track approach (values plus structural constraints) would require.