40 - Jason Gross on Compact Proofs and Interpretability

5 snips

Mar 28, 2025

In this engaging talk, Jason Gross, a researcher in mechanistic interpretability and software verification, dives into the fascinating world of compact proofs. He discusses their crucial role in benchmarking AI interpretability and how they help prove model performance. The conversation also touches on the challenges of randomness and noise in neural networks, the intersection of proofs and modern machine learning, and innovative approaches to enhancing AI reliability. Plus, learn about his startup focused on automating proof generation and the road ahead for AI safety!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Mechinterp as Compression

Mechanistic interpretability (mechinterp) can be viewed as compressing model explanations.
Proofs measure this compression by quantifying explanation length and accuracy bound tightness.

INSIGHT

Generalization and Case Analysis

Networks don't treat every input uniquely, enabling case analysis in proofs.
This allows for fewer cases than inputs, facilitating compression.

INSIGHT

Broader View of Symmetry

Symmetry in networks encompasses more than just mathematical rotations or reflections.
It involves identifying irrelevant bits and recognizing similar treatment of relevant ones, like synonyms in language.

Get the Snipd Podcast app to discover more snips from this episode

Get the app