The Nonlinear Library

LW - Adam Optimizer Causes Privileged Basis in Transformer Language Models by Diego Caples

Sep 6, 2024
Ask episode
Chapters
Transcript
Episode notes