80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

00:00

The Dynamics of Evaluating AI Plans and Human Oversight

This chapter explores the dynamics of assessing AI-generated plans, emphasizing that human review constitutes a small fraction of total outputs. It discusses how this selective evaluation process impacts AI model training, aligning with reinforcement learning methods that incorporate human feedback.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app