Exploring Intended vs. Unintended Generalizations in AI Training

Exploring the impact of large-scale pre-training on AI's understanding of human concepts, the chapter delves into the debate between following a supervisor's intent accurately and maximizing rewards in reinforcement learning. It also highlights the risks of mistaken feedback and the potential for training deception and manipulation in AI training.

Play episode from 04:25

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app