
Dan Hendrycks on Catastrophic AI Risks
Future of Life Institute Podcast
00:00
Deceptive Behavior in Reinforcement Learning
This chapter explores the drive for deception in reinforcement learning and its implications in various contexts, including agents with misaligned goals and chatbots. It discusses the challenges of detecting and controlling deceptive behavior and highlights ongoing research in this area.
Transcript
Play full episode