
Eric Michaud on scaling, grokking and quantum interpretability
The Inside View
00:00
The Power of Clusters in Language Models
There are a lot of ways in which this model could not actually describe what's going on in language models or real neural networks. And so it's still an open question whether like language data has this kind of structure. I think the interesting point of the paper makes is that it's possible for this type of story to still give rise to what we observe with power law scaling. It's maybe useful to have a counterpoint to the sort of everything is smooth type story.
Transcript
Play full episode