80,000 Hours Podcast cover image

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours Podcast

00:00

The Dynamics of Evaluating AI Plans and Human Oversight

This chapter explores the dynamics of assessing AI-generated plans, emphasizing that human review constitutes a small fraction of total outputs. It discusses how this selective evaluation process impacts AI model training, aligning with reinforcement learning methods that incorporate human feedback.

Play episode from 01:27:38
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app