AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Parallel Processing of Transformers
The transformers are an example of a deconstraint. The update that you make after using the transformer is in a good way localized. So therefore what I do on a given Transformer I can relatively readily change Now if on the other hand, I have an unended run through things I The update that I need to make is much less localized And I believe that that's one of the things that contribute to the scaling advantages um That's great. Yeah Okay, let me see how to best um Yeah, so Was a while ago that eroded Um, you're good so so You you locally the transformer locally looks at a bunch of tokensUm, it doesn't look at all tokens and