
33 - RLHF Problems with Scott Emmons
AXRP - the AI X-risk Research Podcast
00:00
Exploring Costly Signaling and Overjustification in Economics and AI Algorithms
The chapter delves into the concept of costly signaling in economics and overjustification effect in relation to agent benefit and human's reward. It discusses the nuances of paying costs for signaling in economics versus AI algorithms, highlighting convergences and distinctions. The speakers explore how overjustification plays a role in agent behavior, emphasizing its impact on being well-informed and optimizing objectives in the context of RLHF (Reinforcement Learning from Human Feedback).
Transcript
Play full episode