Train a Machine Learning System to Do Really Well According to a Lost Function

The problem is that the system does something different when you're deploying it. And this eventually leads to disastrous outcomes. The first possibility is that the lost function was bad, giving high reward ot actions that would be catastrophically bad if they happened in the real world. Then there is making sure that the stuff that it doesn't do during training and deployment are all evaluated so we don't have this failure or thitulity. These kind of people sometimes call them inner alignment versus outer alignment.

Play episode from 11:34

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app