
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
Random Network Distillation - Is There a Reward?
So what if we thought about single agent? Rl, that's more of like a two agent problem. And essentially you want to do a similar type of curriculum where you want to play against oa against the environments in which you did poorly. The idea is that at the states you visited more often, your random network predictor network is going to be better at predicting tha random network. So you can basically take the error in terms of predicting the random network at each state as a measure of novelty for that state. But ultimately we came up with a much simpler heristic which was just basically the value function lost.
Transcript
Play full episode