AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Optimal Model Size for a Given Compute Budget Is a Power Law
The optimal model size as a function of compute is called an opt C. The power laws are surprisingly similar for all these, you know, seemingly to me at least pretty different domains. It feels like it wants to tell us something about some theoretical thing that we don't quite understand yet, I think is exciting.