Generally Intelligent cover image

Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents

Generally Intelligent

00:00

The Learning Potential Score for Revisions of Levels

In the original priotize levelry play, or p l r paper, we basically used the l one value, lst, as the score for priotizing the revisitation of each level. We found that when you actually use this l one value loss base priortization on openaproction, you end up doing worse than uniform sampling. But then if you add the staleness sampling, i'm going to sample some percentage of the time instead from a staleness distribution that samples by priotizing for the age of levels. Some in a sample level that hasn't been updated in a long time in terms of its score. Then if you do like 30 % staleness

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app