AI Safety Fundamentals: Alignment cover image

Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It

AI Safety Fundamentals: Alignment

00:00

How to Train a Robot to Pick Strawberries

The ordinary life equivalent is teaching to the test. The system's programmers, for example the Department of Education, have an objective. Then they delegate that objective to Mesa Optimizers via a proxy objective. Teachers get paid more if their students get higher test scores. To good-heart is to follow the letter of your reward function rather than the spirit.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app