AI Safety Fundamentals: Alignment cover image

Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It

AI Safety Fundamentals: Alignment

00:00

The Importance of Inner Alignment

The process will probably start with us running a gradient-descended loop with some kind of objective function. That will produce a meso-optimizer with some other, potentially different objective function. Outer alignment problems tend to sound like saucer as apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes super-intelligent and converts the whole world into strawberries so it can pick as many as possible. This is the scenario that a lot of AI alignment research focuses on. And we create the first true planning agent on purpose or by accident.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app