AI Safety Fundamentals: Alignment cover image

ML Systems Will Have Weird Failure Modes

AI Safety Fundamentals: Alignment

00:00

Deceptive Alignment in Machine Learning Systems

Exploring the concept of non-myopic agents and deceptive alignment in machine learning systems, highlighting the complexities and implications of behavior changes post-deployment, challenging common assumptions about reward optimization and alignment in models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app