GPT Training Isn't Predictive Orthogonality

GPT's training doesn't incentivize instrumental behavior that optimizes prediction accuracy. GPT does not generate rollouts during training. Its output is never sampled to yield actions, whose consequences are evaluated. At best, the agent is an epicycle, but it is also compatible with interpretations that generate dubious predictions.

Play episode from 35:53

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app