The 80000 Hours Podcast on Artificial Intelligence cover image

Two: Ajeya Cotra on accidentally teaching AI models to deceive us

The 80000 Hours Podcast on Artificial Intelligence

00:00

Analyzing the Evaluation Process of AI Plans and Human Feedback in Reinforcement Learning

Exploring the comparison between outcomes-based and plan-making AI systems, emphasizing on the role of human feedback and reward models in training AI systems.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app