How Do You Like Skill This Curiosity?

The navigation set up work exactly? Do you determine where to go next in a random. And the policy is rewarded by soft navigating to this offset u. So essentially, your policy is been trained to dijust explore the surrounding environment. Then, like thi data that it gathers, we just use it to train our representation. I see. That's really interesting. Well, i'm actually quite surprised that that performs better than a policy that tries to maximise improvement in the representation.

Play episode from 24:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app