
33 - RLHF Problems with Scott Emmons
AXRP - the AI X-risk Research Podcast
00:00
Challenges in Reinforcement Learning from Human Feedback
Exploring scenarios in reinforcement learning where ambiguity in human feedback leads to challenges in determining preferences between trying, failing, and not trying. The chapter delves into the impact of human observations and beliefs on a robot's learning process and ability to infer accurate rewards. Discussions include strategies to enhance AI systems by improving sensor capabilities and investing in better observations to reduce deceptive behavior and optimize decision-making.
Transcript
Play full episode