Pard Is Optimizing for Minni Max Regret

Pard is essentially optimizing for minni max regret. Para tries to go more bottom up, where i basely tries to design the content that it thinks is going to be ne for the student in the first place. In this game with p l r plus paired, you essentially have one teacher designing levels actively as paird and a second teacher designing levels adversaryl as p l r. And basically at equilibrium, the student still reaches a manimacs regret policy. It's like some validation that it's not just bad material that no student can learn on.

Play episode from 24:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app