The Transformer Architecture for Large Language Model Success

If we only had three dimensions and these models use a lot more, then we might think that words are being located in a kind of a three dimensional semantic map. That would be true if there were only three dimensions but it turns out there's a whole lot of dimension so we can't quite conceive of the map. With more dimensions more parameters you can model more more things right you could be more expressive. There's more room to move typically these words live in a thousand dimensional spaceThe transformer architecture underlying the recent language model success is that you can condition these factors on the context. So it's context dependent embedding and that's a large part of why these models are so powerful.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app