
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
Game Theory
Don't train on levels where you're just visiting those levels for the first time, and only train when you replay the levels. From a game point of view, it makes the algrithm equivalent to playing against a regret maximizing adversary. I would definitely be curious to see, like, if you had the oracle values for all the scores at e one time, and then different scalene settings and sea liht, what does that drift look like?
Transcript
Play full episode