
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
LessWrong (Curated & Popular)
00:00
Challenges of Achieving Interpretability in Transformative Models
The chapter explores the difficulties in achieving interpretability in models like GPT-4 and GPT-2, discussing the limitations of a proposed metric for measuring loss recovery and the importance of mechanistic interpretability. They also delve into the threshold for model differences and safety concerns in explanations.
Transcript
Play full episode