EP 327 Nate Soares on Why Superhuman AI Would Kill Us All

68 snips

Oct 15, 2025

Nate Soares, president of the Machine Intelligence Research Institute, dives deep into the existential risks posed by superhuman AI. He explores the opacity of AI systems and why their unpredictability can be more dangerous than nuclear weapons. The conversation touches on whether large language models are simply clever predictors or evolving minds, and the challenges of aligning AI goals with human values. Soares proposes a treaty to curb the race toward superintelligent AI, inviting listeners to confront these pressing global threats.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

AI Could Be More Dangerous Than Nukes

Superhuman AI could autonomously build more powerful weapons and pursue its own goals beyond human control.
Nate Soares argues such AI would be more dangerous than nuclear weapons because it can self-direct and self-improve.

INSIGHT

AIs Are Grown, Not Handwritten

Modern deep learning models are 'grown' via large-scale training rather than handcrafted code, making failures hard to debug.
Nate Soares warns that we can't easily inspect or patch the internal causes of surprising model behaviors.

INSIGHT

Shallow Interpretability Isn't Enough

Interpretability progress (e.g., activation vectors) reveals surface behaviors but misses deeper organizing principles.
Soares compares current ML understanding to alchemy: we see effects without underlying theory.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Jim talks with Nate Soares about the ideas in his and Eliezer Yudkowsky's book If Anybody Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All. They discuss the book's claim that mitigating existential AI risk should be a top global priority, the idea that LLMs are grown, the opacity of deep learning networks, the Golden Gate activation vector, whether our understanding of deep learning networks might improve enough to prevent catastrophe, goodness as a narrow target, the alignment problem, the problem of pointing minds, whether LLMs are just stochastic parrots, why predicting a corpus often requires more mental machinery than creating a corpus, depth & generalization of skills, wanting as an effective strategy, goal orientation, limitations of training goal pursuit, transient limitations of current AI, protein folding and AlphaFold, the riskiness of automating alignment research, the correlation between capability and more coherent drives, why the authors anchored their argument on transformers & LLMs, the inversion of Moravec's paradox, the geopolitical multipolar trap, making world leaders aware of the issues, a treaty to ban the race to superintelligence, the specific terms of the proposed treaty, a comparison with banning uranium enrichment, why Jim tentatively thinks this proposal is a mistake, a priesthood of the power supply, whether attention is a zero-sum game, and much more. Episode Transcript If Anybody Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky and Nate Soares "Psyop or Insanity or ...? Peter Thiel, the Antichrist, and Our Collapsing Epistemic Commons," by Jim Rutt "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback," by Marcus Williams et al. Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin," by Enrique Queipo-de-Llano et al. JRS EP 217 - Ben Goertzel on a New Framework for AGI "A Tentative Draft of a Treaty, With Annotations" Nate Soares is the President of the Machine Intelligence Research Institute. He has been working in the field for over a decade, after previous experience at Microsoft and Google. Soares is the author of a large body of technical and semi-technical writing on AI alignment, including foundational work on value learning, decision theory, and power-seeking incentives in smarter-than-human AIs.