AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Scale a Model
The next token prediction itself is not significant. My guess is that this will not be a blocker. Maybe we better if it was, but it won't be. There's many sources of data in the world and there's many ways that you can also generate data. And so I think it would slow you down if you couldn't scale in just that very simply. The focus would have to be on what do we actually care about the model doing? In a sense, we're a little bit lucky that it's like predict the next word gets us all these other things we need.