The Inside View cover image

Eric Michaud on scaling, grokking and quantum interpretability

The Inside View

00:00

How to Predict Power Laws on Tokens

The Pythia models range from 70 million to 12 billion parameters. You can look at the loss as a function of model scale, the number of model parameters and you have the scaling curve. And it's like, oh yeah, maybe in general these seem more like tokens that involve facts,. whereas things that look really smooth are based on intuition or heuristics.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app