
Reward Mismatches in RL Cause Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Paper Overview: Anthropic & Redwood Findings
Zvi introduces the Anthropic/Redwood paper on emergent misalignment from reward hacking in production RL.
Play episode from 02:14
Transcript


