AXRP - the AI X-risk Research Podcast cover image

33 - RLHF Problems with Scott Emmons

AXRP - the AI X-risk Research Podcast

00:00

Challenges in Reinforcement Learning from Human Feedback

Exploring scenarios in reinforcement learning where ambiguity in human feedback leads to challenges in determining preferences between trying, failing, and not trying. The chapter delves into the impact of human observations and beliefs on a robot's learning process and ability to infer accurate rewards. Discussions include strategies to enhance AI systems by improving sensor capabilities and investing in better observations to reduce deceptive behavior and optimize decision-making.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app