Noam Brown, who leads the multi-agent team at OpenAI, shares insights from his groundbreaking work in AI, especially in competitive strategy games like poker and Diplomacy. He discusses the fascinating impact of AI on human gameplay and critiques the constraints of the System 1/2 thinking model in AI reasoning. The conversation also touches on the challenges of test-time compute limitations, multi-agent intelligence, and innovative applications of AI tools like Codex and Windsurf, while pondering the future of AI civilizations.
00:00
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
00:00 / 00:00
Noam’s Diplomacy Journey
Noam Brown improved his Diplomacy gameplay by deeply studying the game and learning from his AI bot Cicero's unique moves.
This process helped him win the 2025 World Diplomacy Championship several years after releasing Cicero.
00:00 / 00:00
LLM Advances Enable Realistic Bots
Early Diplomacy bots struggled with language quality causing hallucinations and inconsistencies.
Modern large language models now pass the Turing test, making bots much harder to distinguish from humans.
00:00 / 00:00
System 2 Needs Strong System 1
Reasoning (System 2) thinking benefits only emerge after models reach sufficient System 1 capabilities.
Early small models showed little lift from chain-of-thought prompting compared to bigger models.
Get the Snipd Podcast app to discover more snips from this episode
In this book, Daniel Kahneman takes readers on a tour of the mind, explaining how the two systems of thought shape our judgments and decisions. System 1 is fast, automatic, and emotional, while System 2 is slower, effortful, and logical. Kahneman discusses the impact of cognitive biases, the difficulties of predicting future happiness, and the effects of overconfidence on corporate strategies. He offers practical insights into how to guard against mental glitches and how to benefit from slow thinking in both personal and business life. The book also explores the distinction between the 'experiencing self' and the 'remembering self' and their roles in our perception of happiness.
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall
Timestamps
00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions
Chapters
00:00:00 Intro & Guest Welcome
00:00:33 Diplomacy AI & Cicero Insights
00:03:49 AI Safety, Language Models, and Steerability
00:05:23 O Series Models: Progress and Benchmarks
00:08:53 Reasoning Paradigm: Thinking Fast and Slow in AI
00:14:02 Design Questions: Harnesses, Tools, and Test Time Compute
00:20:32 Reinforcement Fine-tuning & Model Specialization
00:21:52 The Rise of Reasoning Models at OpenAI
00:29:33 Data Efficiency in Machine Learning
00:33:21 Coding & AI: Codex, Workflows, and Developer Experience
00:41:38 Multi-Agent AI: Collaboration, Competition, and Civilization
00:45:14 Poker, Diplomacy & Exploitative vs. Optimal AI Strategy
00:52:11 World Models, Multi-Agent Learning, and Self-Play
00:58:50 Generative Media: Image & Video Models
01:00:44 Robotics: Humanoids, Iteration Speed, and Embodiment
01:04:25 Rapid Fire: Research Practices, Benchmarks, and AI Progress
01:14:19 Games, Imperfect Information, and AI Research Directions