
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
Pard Is Optimizing for Minni Max Regret
Pard is essentially optimizing for minni max regret. Para tries to go more bottom up, where i basely tries to design the content that it thinks is going to be ne for the student in the first place. In this game with p l r plus paired, you essentially have one teacher designing levels actively as paird and a second teacher designing levels adversaryl as p l r. And basically at equilibrium, the student still reaches a manimacs regret policy. It's like some validation that it's not just bad material that no student can learn on.
Transcript
Play full episode