Is Early in Training Really Early?

There's this interesting property of the networks as soon as they start doing well, which is that they always find the same local optimum no matter what data order you use. So if you create these networks at lik, i don't know, one % of the way through training for image net they'll find optima that are not linearly connected. They won't land in the same convex region of the lost landscape. As soon as your accuracy of these sub networks starts improving to the point where they match full accuracy, suddenly they start finding the same conveX optimu. This has become a little stone on what's called linear connectivity mode. That's a whole other topic we can talk

Play episode from 23:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app