
33 - RLHF Problems with Scott Emmons
AXRP - the AI X-risk Research Podcast
00:00
Exploring Challenges in Reinforcement Learning from Human Feedback
This chapter examines the complexities and potential failure modes in reinforcement learning from human feedback, emphasizing the importance of addressing these challenges as AI systems advance. It discusses issues such as human beliefs, lack of programming expertise, safety concerns, and the depth of research in the field.
Transcript
Play full episode