TalkRL: The Reinforcement Learning Podcast cover image

Jakob Foerster

TalkRL: The Reinforcement Learning Podcast

00:00

The TMI Problem in Deep Learning

The problem is that coexistence of agents in environment during training leads to correlations which can be exploited. In off-belief learning we fundamentally and probably address this. The main insight is the agents actually never train. It really mathematically takes away the risk or the ability of having emergent protocols in multi-agent systems. But don't we need some type of conventions in Hanabi? For example, if you hint this card is a one, I should probably assume that you're telling me this because the card is playable even if it's not obvious from the current state of the board. And then how conventions emerge gradually at the higher levels of the hierarchy. You can get this out by

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app