
40 - Jason Gross on Compact Proofs and Interpretability
AXRP - the AI X-risk Research Podcast
Mechanistic Interpretability Unpacked
This chapter explores the balance of overestimations and underestimations in mechanistic interpretability, proposing a new scale for assessing the efficacy of explanations. It emphasizes the challenges of aligning theoretical claims with neural network parameters and introduces CrossCoders as a key advancement in enhancing clarity within the field.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.