AI Safety Fundamentals: Alignment cover image

Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It

AI Safety Fundamentals: Alignment

00:00

The Incomprehensible Meme

Mesa is a Greek prefix which means the opposite of meta. There are deceptively aligned non-myopic meso-optimizers even for a myopic-based objective. Our main source will be Pupinger et al. 2019. Risks from learned optimization in advanced machine learning systems, link in post.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app