
SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)
EA Talks
00:00
Train a Machine Learning System to Do Really Well According to a Lost Function
The problem is that the system does something different when you're deploying it. And this eventually leads to disastrous outcomes. The first possibility is that the lost function was bad, giving high reward ot actions that would be catastrophically bad if they happened in the real world. Then there is making sure that the stuff that it doesn't do during training and deployment are all evaluated so we don't have this failure or thitulity. These kind of people sometimes call them inner alignment versus outer alignment.
Transcript
Play full episode