AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Evolution of Transformers
The architecture of transformers makes that considerably easier to scale. I mean it's still a problem how to scale up really really big transformer Networks, but there was a time where it was exceptionally difficult to write Para write parallel systems. Transmitting information between graphics cards um, the architecture of transformer makes that considerably easy to scale. At fast there's constraints about the flow of information within the system that we have By kind of minimizing that amount of necessary flow Kind of writing transformers is scaling transformers is much easier than it was pre transformers Interesting. Could you talk about something because transformers if you think about gradient descent as like a crucial bit of like instead of writing all your learning