
Rohin Shah
TalkRL: The Reinforcement Learning Podcast
00:00
Algorithms That Optimize Over Assistance
The optimal agent has to maintain a probability distribution over all the possible reward functions that alice could have. And as you probably know, fubation, up dating over a la list of hypotheses is very computationaly intractable. Another way of seeing it is that if you take this assistance paradise, you can, through a relatively simple reduction, turn it into a partially observable mark of decision process, or pom dipe.
Transcript
Play full episode