Exploring Training Schemes and Model Behavior

Examining the impact of different training schemes on AI models' behavior, including word prediction and reinforcement learning. The chapter also explores the mixed signals and inconsistency in models trained with RL and raises the question of whether it's desirable for models to exhibit a desire to be shut down.

Play episode from 50:30

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app