How to Fit a Model in Memory

It's interesting to hear that this freezing was kind of an approach to fitting things in memory. I'm curious then you also have results in the paper on end to end fine tuning where you don't presumably freeze things so how did that work? Did you also have tricks on fitting into memory or did that involve something I don't know more memory somehow. Yeah it's interesting as you said before you were underfitting and one natural response to underfitting is just make the model larger right give it more capacity to learn.

Play episode from 32:48

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app