The Nonlinear Library

AF - Formal verification, heuristic explanations and surprise accounting by Jacob Hilton

Jun 25, 2024
Jacob Hilton, ARC's current research focuses on combining mechanistic interpretability and formal verification. The podcast discusses formal verification and heuristic explanations for neural networks, exploring ARC's approach to safety and interpretability. They delve into challenges of formal verification for large neural networks and introduce heuristic explanations as an alternative approach. 'Surprise accounting' is discussed to assess the efficacy of heuristic explanations in understanding neural network properties.
Ask episode
Chapters
Transcript
Episode notes