

“Large Language Models and the Critical Brain Hypothesis” by David Africa
Summary: I argue for a picture of developmental interpretability from neuroscience. A useful way to study and control frontier-scale language models is to treat their training as a sequence of physical phase transitions. Singular learning theory (SLT) rigorously characterizes singularities and learning dynamics in overparameterized models; developmental interpretability hypothesizes that large networks pass through phase transitions - regimes where behaviour changes qualitatively. The Critical Brain Hypothesis (CBH) proposes that biological neural systems may operate near phase transitions, and I claim that modern language models resemble this. Some transformer phenomena may be closely analogous: in-context learning and long-range dependency handling strengthen with scale and training distribution (sometimes appearing threshold-like), short fine-tunes or jailbreak prompts can substantially shift behaviour on certain evaluations, and “grokking” shows delayed generalization with qualitative similarities to critical slowing-down. Finally, I outline tests and implications under this speculative lens.
Phase Transitions
Developmental interpretability proposes that we use [...]
---
Outline:
(01:19) Phase Transitions
(02:22) The Critical Brain Hypothesis (CBH) and Alignment
(06:55) Critical Daydreaming
(08:33) Critical Surfing and some practical thoughts
(11:20) Collaborate with us
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
September 9th, 2025
---
Narrated by TYPE III AUDIO.