Deep Papers

Explaining Grokking Through Circuit Efficiency

Oct 17, 2023
The podcast explores the concept of grokking and its relationship with network performance. It discusses the use of circuits as modules, module addition and generalization, balancing cross entropy loss and weight decay in deep learning models, circuit efficiency and its role in performance, grokking and the impact on model strength, and the relationship between circuit efficiency and generalization.
Ask episode
Chapters
Transcript
Episode notes