AXRP - the AI X-risk Research Podcast

33 - RLHF Problems with Scott Emmons

5 snips
Jun 12, 2024
Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.
Ask episode
Chapters
Transcript
Episode notes