AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Train a Large Model for Scalability
The training process starts and we call it V0 because there are subsequent versions. We decided to order the training data kind of sequentially in time for the part that's FinPile for the Bloomberg piece, which is about half of it was randomly shuffled. The most recent data would be our validation data. So we're training for four or five days getting started; we notice the training performance curve levels off as a result.