Large Width Expansion and Neural Tangent Kernels

In the limit of large width, so the network gets very wide and in some senses is over-parameterized relative to the input information or the something about the problem that you're trying to solve. In that limit, and when you initialize the network in a kind of random way, then you have this big wide network, which is over- parameterized, and then you use it. And given this particular set of assumptions, they can prove that the optimization of the model to its global minimum actually is convex. That's important because that means in polynomial time you can train this thing and get to basically the global minimum. But at least there's some limiting case of

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app