Is There a Broader Sense of Optimization That Doesn't Involve Reward Functions?

A. A. Dowd: It's been my perception that one of the things a alignment generally has been missing is just a concrete arena in which to reason about these systems. And so i think once we start crossing that point, you might start to see some bad behaviour. Thing is that redirectible agents often have power of seaking sentents,. at least in the settings we look at. The reward function determines whether the opical polisies. are good or bad for an agent. But if you give a randomly generato reward function for pakman, and its like, well, well, pak man's just going to tend to die, right? L is not

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app