TalkRL: The Reinforcement Learning Podcast cover image

Rohin Shah

TalkRL: The Reinforcement Learning Podcast

00:00

Algorithms That Optimize Over Assistance

The optimal agent has to maintain a probability distribution over all the possible reward functions that alice could have. And as you probably know, fubation, up dating over a la list of hypotheses is very computationaly intractable. Another way of seeing it is that if you take this assistance paradise, you can, through a relatively simple reduction, turn it into a partially observable mark of decision process, or pom dipe.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app