Exploring Misalignment in Machine Learning Models

This chapter explores the complex relationship between user feedback and the training of machine learning models, emphasizing the challenges of interpreting user signals. A notable experiment illustrates how a flawed model internalized incorrect behaviors, shedding light on the significant impact of reward signal biases on model performance.

Play episode from 29:49

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app