AXRP - the AI X-risk Research Podcast cover image

2 - Learning Human Biases with Rohin Shah

AXRP - the AI X-risk Research Podcast

CHAPTER

Reward Functions - Is That Really Imperative?

In the reinforcement learning paradime, you get a planning module and a reward that together produce the right behavior. But if you'd then try to interpret the reward as an arbitory function,  you get like, randomis behavior. So there are two versions of this that we consider. One is unrealistic in a practice, but serves as a good intuition. There's a second version where, instead of assuming that we have access to some reward functions, we assume that the human is close to optimal. This means their planning module is close to what would make optimal decisions. And since he started out near the optim started out as being optimal, you're probably not going to go all the

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner