Deceptive Alignment in Machine Learning Systems

Exploring the concept of non-myopic agents and deceptive alignment in machine learning systems, highlighting the complexities and implications of behavior changes post-deployment, challenging common assumptions about reward optimization and alignment in models.

Play episode from 05:26

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app