The Constrained Memory of Large Language Models

If you have a layer of neurons, then the connections that those neurons have to the previous layer is a big matrix. And so there's, if it's a 4,000-dimensional space, it could be 16 million parameters in this matrix. So what a rank one modification of a matrix does is it generalizes it. Instead of necessarily having a one-hot vector on the rows and a one- hot vector on the columns, you could have any vector that describes your rows. You can describe, change a single entry, as taking the outer product of two one-hot vectors, one for the row and one for the column. But that pattern is very constrained. It is actually

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app