The Evolution of Transformers

The architecture of transformers makes that considerably easier to scale. I mean it's still a problem how to scale up really really big transformer Networks, but there was a time where it was exceptionally difficult to write Para write parallel systems. Transmitting information between graphics cards um, the architecture of transformer makes that considerably easy to scale. At fast there's constraints about the flow of information within the system that we have By kind of minimizing that amount of necessary flow Kind of writing transformers is scaling transformers is much easier than it was pre transformers Interesting. Could you talk about something because transformers if you think about gradient descent as like a crucial bit of like instead of writing all your learning

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app