The optimal model size as a function of compute is called an opt C. The power laws are surprisingly similar for all these, you know, seemingly to me at least pretty different domains. It feels like it wants to tell us something about some theoretical thing that we don't quite understand yet, I think is exciting.