AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The TMI Problem in Deep Learning
The problem is that coexistence of agents in environment during training leads to correlations which can be exploited. In off-belief learning we fundamentally and probably address this. The main insight is the agents actually never train. It really mathematically takes away the risk or the ability of having emergent protocols in multi-agent systems. But don't we need some type of conventions in Hanabi? For example, if you hint this card is a one, I should probably assume that you're telling me this because the card is playable even if it's not obvious from the current state of the board. And then how conventions emerge gradually at the higher levels of the hierarchy. You can get this out by