TalkRL: The Reinforcement Learning Podcast cover image

Jakob Foerster

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The Challenges of Multi-Agent Learning

The problem of multi-agent learning is non-stationarity and equilibrium selection. N naive learning methods have a strong bias towards solutions that lead to radically bad outcomes for all agents in the environment, such as defecting unconditionally in all situations. To make matters worse, we get extremely hard credit assignment problems whereby suddenly the actions that an agent takes in an episode can change the data that enters the replay buffer or the training code of another agency. And one example of what is playing the iterated prisoner dilemma whereby there are a lot of different possible Nash equilibria that could be reached during training.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner