
101 - The lottery ticket hypothesis, with Jonathan Frankle
NLP Highlights
00:00
Linear Interpolation Is Not Enough
We trained on two diferent samples of estuda noise, and found the same linearly connected minimum. When they're separated by a peak, when lost increases, we call them unstable. And so interestingly, this, this linear connectivity, seems to correlate with when the water ticket phenomenon starts working. We were inspired by a number of other papers that have shown phenomena such as can train a network, you can get multiple of it near the minimum,. You can actually average the networks and you'll get a network that performs better. It's something that seems rather strange, but it's actually something that has a lengthy legacy.
Transcript
Play full episode