Generally Intelligent cover image

Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents

Generally Intelligent

00:00

Pard Is Optimizing for Minni Max Regret

Pard is essentially optimizing for minni max regret. Para tries to go more bottom up, where i basely tries to design the content that it thinks is going to be ne for the student in the first place. In this game with p l r plus paired, you essentially have one teacher designing levels actively as paird and a second teacher designing levels adversaryl as p l r. And basically at equilibrium, the student still reaches a manimacs regret policy. It's like some validation that it's not just bad material that no student can learn on.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app