The Power of Clusters in Language Models

There are a lot of ways in which this model could not actually describe what's going on in language models or real neural networks. And so it's still an open question whether like language data has this kind of structure. I think the interesting point of the paper makes is that it's possible for this type of story to still give rise to what we observe with power law scaling. It's maybe useful to have a counterpoint to the sort of everything is smooth type story.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app