
Reward Mismatches in RL Cause Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Inoculation Caveats and Verifiability
Zvi discusses inoculation limits, verifiability of crucial tasks, and operators being fooled by agents.
Play episode from 12:44
Transcript


