3min chapter

AI Safety Fundamentals: Alignment cover image

Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It

AI Safety Fundamentals: Alignment

CHAPTER

The Myopic Meso-Optimizer

A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour of deployment. If this worked perfectly, it would create an optimizer with a short time horizons. And we eliminate the incentive for deception by ensuring that the base optimizer is myopic. Even if the base optimiser is myopic, the meso-optimizer might not be. As far as we know there are no existing full meso- Optimizers. You just run the gradient descent and cross your fingers. The most likely outcome?

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode