
Episode 16: Yilun Du, MIT, on energy-based models, implicit functions, and modularity
Generally Intelligent
00:00
How Do You Like Skill This Curiosity?
The navigation set up work exactly? Do you determine where to go next in a random. And the policy is rewarded by soft navigating to this offset u. So essentially, your policy is been trained to dijust explore the surrounding environment. Then, like thi data that it gathers, we just use it to train our representation. I see. That's really interesting. Well, i'm actually quite surprised that that performs better than a policy that tries to maximise improvement in the representation.
Transcript
Play full episode