AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Cost of Loading Up Tokens on the GPU
The next challenge will be is like sort of the cost of loading up all these tokens on the GPU, especially if you have to do them over the network. I would doubt that Nate K window would help that much simply because the model is not even going to be dealing with 8K tokens and practice on those tasks. It just remains coherent for a lot longer. And when you get to things like programming tasks where that may be necessary, it's really helpful. That's to me, that's my guess as to why C4 does so well, even though by all rights is a pretty old bad data set.