
101 - The lottery ticket hypothesis, with Jonathan Frankle
NLP Highlights
00:00
Scaling Up Latradic Hypotheses
Jonathan: The original experiment doesn't work once you get to a certain model complexity or task size. But there are two ways of showing how this behavior does actually scale up, and i think each of those ways ends up providing some interesting insights into how deep learning works in general. We found something quite interesting, a phenomenon were referring ous instability. So we wanted to measure how unstable is the network to the noise of stocastic radiant descent. And then we want to compare these networks somehow, to understand how much did sjiti noise really affect the optimization process by giving us how differet were these two networks. It's very exciting, because we have this ability to compare
Transcript
Play full episode