EA Talks cover image

SERI 2022: AI alignment and Redwood Research | Buck Shlegeris (CTO)

EA Talks

00:00

Train a Machine Learning System to Do Really Well According to a Lost Function

The problem is that the system does something different when you're deploying it. And this eventually leads to disastrous outcomes. The first possibility is that the lost function was bad, giving high reward ot actions that would be catastrophically bad if they happened in the real world. Then there is making sure that the stuff that it doesn't do during training and deployment are all evaluated so we don't have this failure or thitulity. These kind of people sometimes call them inner alignment versus outer alignment.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app