Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

00:00

The Illusion of Grokking and Smooth Scaling Laws in Deep Learning

Grokking is in many ways kind of an illusion, and is an overlap between a phase transition where the model goes from cannot generalized to can generalize fairly suddenly and the phenomena where it is faster to memorize than to generalize./nModels are full of lots of phase transitions, and if they follow a sudden distribution, smooth scaling laws happen./nGrokking is overhyped and people significantly overestimate the degree to which it has deep insights for us about how networks work.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app