

40 - Jason Gross on Compact Proofs and Interpretability
5 snips Mar 28, 2025
In this engaging talk, Jason Gross, a researcher in mechanistic interpretability and software verification, dives into the fascinating world of compact proofs. He discusses their crucial role in benchmarking AI interpretability and how they help prove model performance. The conversation also touches on the challenges of randomness and noise in neural networks, the intersection of proofs and modern machine learning, and innovative approaches to enhancing AI reliability. Plus, learn about his startup focused on automating proof generation and the road ahead for AI safety!
AI Snips
Chapters
Transcript
Episode notes
Mechinterp as Compression
- Mechanistic interpretability (mechinterp) can be viewed as compressing model explanations.
- Proofs measure this compression by quantifying explanation length and accuracy bound tightness.
Generalization and Case Analysis
- Networks don't treat every input uniquely, enabling case analysis in proofs.
- This allows for fewer cases than inputs, facilitating compression.
Broader View of Symmetry
- Symmetry in networks encompasses more than just mathematical rotations or reflections.
- It involves identifying irrelevant bits and recognizing similar treatment of relevant ones, like synonyms in language.