AI Safety Fundamentals cover image

The Alignment Problem From a Deep Learning Perspective

AI Safety Fundamentals

00:00

Why Misaligned Goals Lead to Power-Seeking

Three reasons misaligned goals get reinforced, instrumental convergence toward power, deceptive alignment during training, and illustrative threat models (takeover and gradual erosion).

Play episode from 18:04
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app