3min chapter

TalkRL: The Reinforcement Learning Podcast cover image

Jakob Foerster

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The TMI Problem in Deep Learning

The problem is that coexistence of agents in environment during training leads to correlations which can be exploited. In off-belief learning we fundamentally and probably address this. The main insight is the agents actually never train. It really mathematically takes away the risk or the ability of having emergent protocols in multi-agent systems. But don't we need some type of conventions in Hanabi? For example, if you hint this card is a one, I should probably assume that you're telling me this because the card is playable even if it's not obvious from the current state of the board. And then how conventions emerge gradually at the higher levels of the hierarchy. You can get this out by

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode