
Artificial Intelligence & Large Language Models: Oxford Lecture — #35
Manifold
Large Width Expansion and Neural Tangent Kernels
In the limit of large width, so the network gets very wide and in some senses is over-parameterized relative to the input information or the something about the problem that you're trying to solve. In that limit, and when you initialize the network in a kind of random way, then you have this big wide network, which is over- parameterized, and then you use it. And given this particular set of assumptions, they can prove that the optimization of the model to its global minimum actually is convex. That's important because that means in polynomial time you can train this thing and get to basically the global minimum. But at least there's some limiting case of
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.