

33 - RLHF Problems with Scott Emmons
5 snips Jun 12, 2024
Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11
Intro
00:00 • 2min
Exploring Deceptive Inflation and Misalignment in Objectives
02:11 • 2min
Deceptive Inflation in Reinforcement Learning from Human Feedback
03:47 • 19min
Exploring Costly Signaling and Overjustification in Economics and AI Algorithms
22:42 • 10min
Partial Observability and Rationality in Human-AI Interaction
32:50 • 14min
Navigating Belief Calibration in Human-Robot Interactions
47:05 • 12min
Challenges in Reinforcement Learning from Human Feedback
59:24 • 24min
Exploring Challenges in Reinforcement Learning from Human Feedback
01:23:17 • 7min
Discussion on a Research Paper and its Place in AI Risk Research
01:30:30 • 3min
Exploring Alignment and Incentives in Reinforcement Learning
01:33:11 • 5min
Exploring Algorithms and Human Beliefs in Research
01:38:22 • 3min