Why Optimization Is Stable?

So just a couple of questions about why optimization might be unstable so right is it that when you already have a large network with the rest of the weights uh in a certain state you add this prefix drawn from say a random distribution of weights they're very different. Do you think that's why optimization is kind of unstable here do you have any intuitions on why that might be the case? umYeah definitely so i think it's probably one possible explanation like we didn't do off very deep into this question but intuitively it could be that because when we try to randomly initialize it lies in a very different space then then even if we are actually processing words and computing this latent activations however

Play episode at 16:58

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app