
#131 Toby Ord - Will AI Destroy Humanity?
Within Reason
00:00
Detection, interpretability, and scheming
Toby explains chain-of-thought access, interpretability limits, and detectors that models can evade.
Play episode from 21:30
Transcript

Toby explains chain-of-thought access, interpretability limits, and detectors that models can evade.