The Parallel Processing of Transformers

The transformers are an example of a deconstraint. The update that you make after using the transformer is in a good way localized. So therefore what I do on a given Transformer I can relatively readily change Now if on the other hand, I have an unended run through things I The update that I need to make is much less localized And I believe that that's one of the things that contribute to the scaling advantages um That's great. Yeah Okay, let me see how to best um Yeah, so Was a while ago that eroded Um, you're good so so You you locally the transformer locally looks at a bunch of tokensUm, it doesn't look at all tokens and

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app