Exploration of Mechanistic Interpretability for Model Performance Proofs

Exploring proof strategies and the trade-off between compression and correspondence in explaining small transformers, emphasizing mechanistic understanding for generating compact proofs with tighter bounds and addressing challenges in reasoning about errors during model weight compression.

Play episode from 00:00

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app