TalkRL: The Reinforcement Learning Podcast cover image

Rohin Shah

TalkRL: The Reinforcement Learning Podcast

CHAPTER

Algorithms That Optimize Over Assistance

The optimal agent has to maintain a probability distribution over all the possible reward functions that alice could have. And as you probably know, fubation, up dating over a la list of hypotheses is very computationaly intractable. Another way of seeing it is that if you take this assistance paradise, you can, through a relatively simple reduction, turn it into a partially observable mark of decision process, or pom dipe.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner