
Aravind Srinivas 2
TalkRL: The Reinforcement Learning Podcast
00:00
TD Learning
Q-learning is a paradigm shift, because you don't have to do all these dynamic programming or policy gradients. You just let the deep neural network figure out what it means to optimize long-term reward. In some results, this isn't transformer, it gets about the same as TD learning. And in some cases, it does better. How is it doing this? I was a little surprised that TD learning is represented by CQL. Would there be other algorithms that might do better to represent TD learning here? Many people are working on that.
Transcript
Play full episode