AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Predict Power Laws on Tokens
The Pythia models range from 70 million to 12 billion parameters. You can look at the loss as a function of model scale, the number of model parameters and you have the scaling curve. And it's like, oh yeah, maybe in general these seem more like tokens that involve facts,. whereas things that look really smooth are based on intuition or heuristics.