"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Mechanistic Interpretability: Philosophy, Practice & Progress with Goodfire's Dan Balsam & Tom McGrath

258 snips

May 29, 2025

In a thought-provoking discussion, Dan Balsam, CTO of Goodfire, and Tom McGrath, Chief Scientist, dive into the exciting world of mechanistic interpretability in AI. They analyze how understanding neural networks can spark breakthroughs in scientific discovery and creative domains. The pair tackle challenges in natural language processing and model debugging, drawing fascinating parallels with biology. Additionally, they underscore the importance of funding and innovative approaches in advancing AI explainability, paving the way for a more transparent future.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Interpretability as Empirical Science

Interpretability relies heavily on rich empirical data from models' internal activations.
Progress is like natural science: observing phenomena and forming hypotheses gradually.

INSIGHT

Sparse Autoencoders as Microscopes

Sparse autoencoders (SAEs) give a reductive sensor into model internals with reconstruction loss trade-offs.
Improving SAEs and developing experiment scaffolding is key to better model abstraction and interpretability.

INSIGHT

Interpretability's Proto-Paradigm

Mechanistic interpretability is now proto-paradigmatic, not pre-paradigmatic.
There is growing consensus features are linear directions forming circuits with superposition enabling more concepts than dimensions.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

In this episode, Daniel Balsam and Tom McGrath, at Goodfire, discuss the future of mechanistic interpretability in AI models. They explore the fundamental inputs like models, compute, and algorithms, and emphasize the importance of a rich empirical approach to understanding how models work. Balsam and McGrath provide insights into ongoing projects and breakthroughs, particularly in scientific domains and creative applications, as they aim to push the frontiers of AI interpretability. They also discuss the company's recent funding and their goal to advance interpretability as a critical area in AI research.

SPONSORS:

Box Report: AI is delivering truly measurable productivity — strategic companies are already turning a 37% productivity edge. Discover how in Box’s new 2025 State of AI in the Enterprise Report — read the full report here: https://bit.ly/43uVP52

Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitive-revolution

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk