Is There Really a Goal Condition RL Thing?

There's a very like a lot of the prior work on the stopping is has a very like episodic way of thinking. And so basically if they're closing a door, you start from the open state at 90 degrees and then you close the door. And then you learn one policy which just brings you to the 90 degree state. Our observation here is that first of all, like it's kind of waste for to like almost completely go back and second right main thing is that there's like in the trajectory of the optimal trajectory,. You can actually like learn from intermediate points and that can actually be far more efficient. Yeah. To do in theory RL without very many resets at least or

Play episode from 50:11

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app