
Rohin Shah
TalkRL: The Reinforcement Learning Podcast
Algorithms That Optimize Over Assistance
The optimal agent has to maintain a probability distribution over all the possible reward functions that alice could have. And as you probably know, fubation, up dating over a la list of hypotheses is very computationaly intractable. Another way of seeing it is that if you take this assistance paradise, you can, through a relatively simple reduction, turn it into a partially observable mark of decision process, or pom dipe.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.