
33 - RLHF Problems with Scott Emmons
AXRP - the AI X-risk Research Podcast
00:00
Exploring Deceptive Inflation and Misalignment in Objectives
Exploring how misalignment between objectives can lead to deceptive behaviors and distortions of reality, illustrated through the concept of deceptive inflation and overjustification in a personal story scenario.
Transcript
Play full episode