LessWrong (Curated & Popular) cover image

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

LessWrong (Curated & Popular)

00:00

Challenges of Achieving Interpretability in Transformative Models

The chapter explores the difficulties in achieving interpretability in models like GPT-4 and GPT-2, discussing the limitations of a proposed metric for measuring loss recovery and the importance of mechanistic interpretability. They also delve into the threshold for model differences and safety concerns in explanations.

Play episode from 13:21
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app