Introduction

GPT policies do not seem globally agentic, yet can be conditioned to behave in goal-directed ways. This post describes a frame that enables more natural reasoning about properties like agency. The outer objective of self-supervised learning is base optimal conditional inference over the prior of the training distribution - which I call the simulation objective. Because a conditional model can be used to simulate rollouts, which probabilistically obey its learned distribution,. by iteratively sampling from its posterior predictions, and updating the condition, prompt. Analogously, a predictive model of physics can be use to compute rollouts of phenomena in simulation. And here we have an image. From doll E2 it's a response to

Play episode from 00:00

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app