TalkRL: The Reinforcement Learning Podcast cover image

Jakob Foerster

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The TMI Problem in Deep Learning

The problem is that coexistence of agents in environment during training leads to correlations which can be exploited. In off-belief learning we fundamentally and probably address this. The main insight is the agents actually never train. It really mathematically takes away the risk or the ability of having emergent protocols in multi-agent systems. But don't we need some type of conventions in Hanabi? For example, if you hint this card is a one, I should probably assume that you're telling me this because the card is playable even if it's not obvious from the current state of the board. And then how conventions emerge gradually at the higher levels of the hierarchy. You can get this out by

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner