AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Fit a Model in Memory
It's interesting to hear that this freezing was kind of an approach to fitting things in memory. I'm curious then you also have results in the paper on end to end fine tuning where you don't presumably freeze things so how did that work? Did you also have tricks on fitting into memory or did that involve something I don't know more memory somehow. Yeah it's interesting as you said before you were underfitting and one natural response to underfitting is just make the model larger right give it more capacity to learn.