Is There a Reward Function?

The hope is that this would cause them to become corrigible. This means being willing to be corrected because iy've made a mistake. So it's like correctible, i guess. Yet why the existing traditional systems may be over confidentd and resistant to correction or being turned off. And so people like stuart russell, for example, have been thinking about how can we incorporate some notion of uncertainty or doubt?

Play episode from 01:35:05

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app