Random Network Distillation - Is There a Reward?

So what if we thought about single agent? Rl, that's more of like a two agent problem. And essentially you want to do a similar type of curriculum where you want to play against oa against the environments in which you did poorly. The idea is that at the states you visited more often, your random network predictor network is going to be better at predicting tha random network. So you can basically take the error in terms of predicting the random network at each state as a measure of novelty for that state. But ultimately we came up with a much simpler heristic which was just basically the value function lost.

Play episode from 11:18

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app