David Silver, a lead researcher at DeepMind, dives into the revolutionary world of reinforcement learning, having pioneered breakthroughs with AlphaGo and AlphaZero. He shares his journey from childhood programming to mastering AI strategies, highlighting the complexities of the game Go. The conversation explores the transformative power of self-play in AI learning, the emotional impact of AlphaGo's historic win against Lee Sedol, and the philosophical implications of defining rewards in artificial systems. Silver's insights challenge our understanding of intelligence in machines.
01:48:28
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
question_answer ANECDOTE
First Program
David Silver's first program displayed his name in various colors on a BBC Model B microcomputer.
His father, inspired by the machine, shifted careers to study AI, exposing young David to Prolog and family tree queries.
question_answer ANECDOTE
Early Game AI
During his work in the games industry, David Silver created AIs that could outperform him in specific scenarios.
However, these handcrafted agents relied on speed and pattern exploitation rather than true intelligence.
question_answer ANECDOTE
First Go Program
David Silver's first Go program, built using reinforcement learning, learned by trial and error and self-play.
This program eventually surpassed his own Go skills, giving him satisfaction that a self-learning system could achieve such a feat.
Get the Snipd Podcast app to discover more snips from this episode
This second edition of 'Reinforcement Learning: An Introduction' by Richard S. Sutton and Andrew G. Barto provides a clear and simple account of the field's key ideas and algorithms. The book is significantly expanded and updated, including new topics such as artificial neural networks, the Fourier basis, and expanded treatment of off-policy learning and policy-gradient methods. It also includes new chapters on the relationships between reinforcement learning and psychology/neuroscience, as well as updated case studies on AlphaGo, AlphaGo Zero, Atari game playing, and IBM Watson's wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.
David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.
This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon.
Here’s the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time.
OUTLINE:
00:00 – Introduction
04:09 – First program
11:11 – AlphaGo
21:42 – Rule of the game of Go
25:37 – Reinforcement learning: personal journey
30:15 – What is reinforcement learning?
43:51 – AlphaGo (continued)
53:40 – Supervised learning and self play in AlphaGo
1:06:12 – Lee Sedol retirement from Go play
1:08:57 – Garry Kasparov
1:14:10 – Alpha Zero and self play
1:31:29 – Creativity in AlphaZero
1:35:21 – AlphaZero applications
1:37:59 – Reward functions
1:40:51 – Meaning of life