Generally Intelligent cover image

Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents

Generally Intelligent

00:00

Random Network Distillation - Is There a Reward?

So what if we thought about single agent? Rl, that's more of like a two agent problem. And essentially you want to do a similar type of curriculum where you want to play against oa against the environments in which you did poorly. The idea is that at the states you visited more often, your random network predictor network is going to be better at predicting tha random network. So you can basically take the error in terms of predicting the random network at each state as a measure of novelty for that state. But ultimately we came up with a much simpler heristic which was just basically the value function lost.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app