The Nonlinear Library

LW - SAE reconstruction errors are (empirically) pathological by wesg

Mar 29, 2024
SAE reconstruction errors are empirically pathological, leading to significant changes in token prediction probabilities. Understanding these errors is crucial for advancing SAEs. The podcast explores challenges in maintaining faithfulness, impact of perturbations on vectors, limitations in activation space reconstruction, and evaluating SAE reconstructions through metrics like KL divergence.
Ask episode
Chapters
Transcript
Episode notes