A Transformer Is a Sequence Modeling Thing

The model is made up of these layers, these simple functions. A transformer should work on a sequence with like one word and a sequence with a thousand words. And transformers are made up of alternating attention and MLP layers. So at its heart, because the model is a sequence modeling thing, it's doing things in parallel on each word.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app