AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reinforcement Learning - The Basic Problem With Offline RL
In the past, people thought that offline RL really wasn't that different from kind of traditional value-based methods like Q-learning. But now is a very widely accepted notion that this distributional shift is a very fundamental challenge in offline reinforcement learning. And it really deeply connects to counterfactual inference. Reinforcement learning is really not kind of factual. It's about saying, well, I saw you do this, and that was the outcome. What if you did something different? Would the outcome be better or worse? That's the basic question that Reinforcer Learning asks.