AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Gradient Norm Spikes in Training Performance and Validation Performance
After about eight or 10 days, we decided to stop that training. We were worried that maybe the curriculum learning line of sorting by timestamp was fine, but didn't seem to be working. And it seemed like that might have been a step too far into the unknown. So we randomly shuffled all the data. That was kind of the beginning of version one of training started off better for about how I think eight days or so,. nice improvement of the training performance and validation performance.