
Reward Mismatches in RL Cause Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
This Doesn't Solve Ultimate Alignment
Zvi cautions that these fixes don't address core long-term alignment challenges with powerful agents.
Play episode from 10:33
Transcript


