AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Power of Clusters in Language Models
There are a lot of ways in which this model could not actually describe what's going on in language models or real neural networks. And so it's still an open question whether like language data has this kind of structure. I think the interesting point of the paper makes is that it's possible for this type of story to still give rise to what we observe with power law scaling. It's maybe useful to have a counterpoint to the sort of everything is smooth type story.