TalkRL: The Reinforcement Learning Podcast cover image

Aravind Srinivas 2

TalkRL: The Reinforcement Learning Podcast

00:00

TD Learning

Q-learning is a paradigm shift, because you don't have to do all these dynamic programming or policy gradients. You just let the deep neural network figure out what it means to optimize long-term reward. In some results, this isn't transformer, it gets about the same as TD learning. And in some cases, it does better. How is it doing this? I was a little surprised that TD learning is represented by CQL. Would there be other algorithms that might do better to represent TD learning here? Many people are working on that.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app