RSam Podcast

Mechanistic Interpretability and How LLMs Understand

11 snips
Jan 10, 2026
Dr. Matthieu Queloz, a Privatdozent at the University of Bern and author on conceptualization ethics, joins Pierre Beckmann, an AI researcher and PhD student specializing in neuro-symbolic AI. They dive into the philosophy of deep learning, exploring how LLMs represent features and concepts. The duo discusses the advantages of language-centered AI and mechanistic interpretability, emphasizing high-dimensional feature packing and the potential for LLMs to form partial world models. They also examine the social functions of understanding and the need to adapt this concept for AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Reframe Understanding Beyond Humans

  • Mechanistic study of LLMs forces us to decouple understanding from human-specific forms.
  • Investigating LLM internals reveals new conceptions of how non-human intelligences grasp meaning.
INSIGHT

Features As Latent Directions

  • Mechanistic interpretability finds coherent internal representations called features or concepts.
  • These features often appear as directions in high-dimensional latent spaces used by models.
INSIGHT

Superposition Explains Massive Feature Capacity

  • Superposition lets many features coexist in high-dimensional layers with little interference.
  • The capacity to pack exponentially many feature-directions explains models' rich concept repertoires.
Get the Snipd Podcast app to discover more snips from this episode
Get the app