The Challenges of Multi-Agent Learning

The problem of multi-agent learning is non-stationarity and equilibrium selection. N naive learning methods have a strong bias towards solutions that lead to radically bad outcomes for all agents in the environment, such as defecting unconditionally in all situations. To make matters worse, we get extremely hard credit assignment problems whereby suddenly the actions that an agent takes in an episode can change the data that enters the replay buffer or the training code of another agency. And one example of what is playing the iterated prisoner dilemma whereby there are a lot of different possible Nash equilibria that could be reached during training.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app