3min chapter

TalkRL: The Reinforcement Learning Podcast cover image

Jakob Foerster

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The Challenges of Multi-Agent Learning

The problem of multi-agent learning is non-stationarity and equilibrium selection. N naive learning methods have a strong bias towards solutions that lead to radically bad outcomes for all agents in the environment, such as defecting unconditionally in all situations. To make matters worse, we get extremely hard credit assignment problems whereby suddenly the actions that an agent takes in an episode can change the data that enters the replay buffer or the training code of another agency. And one example of what is playing the iterated prisoner dilemma whereby there are a lot of different possible Nash equilibria that could be reached during training.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode