
Episode 13: Jonathan Frankle, MIT, on the lottery ticket hypothesis and the science of deep learning
Generally Intelligent
00:00
Is Early in Training Really Early?
There's this interesting property of the networks as soon as they start doing well, which is that they always find the same local optimum no matter what data order you use. So if you create these networks at lik, i don't know, one % of the way through training for image net they'll find optima that are not linearly connected. They won't land in the same convex region of the lost landscape. As soon as your accuracy of these sub networks starts improving to the point where they match full accuracy, suddenly they start finding the same conveX optimu. This has become a little stone on what's called linear connectivity mode. That's a whole other topic we can talk
Transcript
Play full episode