AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
TD Temporal Difference Is Just Algorithms for Retorting Learning
TD temporal difference is about making predictions over time. You can try to use it for making decisions, right? Because if you can predict how good a future action and action outcomes will be in the future, you can choose one that has better. And so people don't know TD temporal difference. These are all just algorithms for reinforcement learning. Right. Q was off policy, which meant that you could actually be learning about the environment and what the value of different actions would be while figuring out how to behave optimally. So that was a revelation.