
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
The Learning Potential Score for Revisions of Levels
In the original priotize levelry play, or p l r paper, we basically used the l one value, lst, as the score for priotizing the revisitation of each level. We found that when you actually use this l one value loss base priortization on openaproction, you end up doing worse than uniform sampling. But then if you add the staleness sampling, i'm going to sample some percentage of the time instead from a staleness distribution that samples by priotizing for the age of levels. Some in a sample level that hasn't been updated in a long time in terms of its score. Then if you do like 30 % staleness
Transcript
Play full episode