
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
How to Maximize Reputation
If you're an adversary and you want ta maximize regret, you'r getto naturally go for the easier things first. For some environments, it might be a little bit trickier. We had an environment that was based on two t car racing. In that environment, we found that if you train pared on the car racing agent, it's somehow able to over explat the difference between the protagonist and the antagonist. And so you essentially end up with a very, very poor driving policy for the protagonist. It almost us to the point where just can't learn.
Transcript
Play full episode