The Rhys Show cover image

Neuroscientist Konrad Kording reveals shocking truth about machine learning and the brain

The Rhys Show

CHAPTER

The Evolution of Transformers

The architecture of transformers makes that considerably easier to scale. I mean it's still a problem how to scale up really really big transformer Networks, but there was a time where it was exceptionally difficult to write Para write parallel systems. Transmitting information between graphics cards um, the architecture of transformer makes that considerably easy to scale. At fast there's constraints about the flow of information within the system that we have By kind of minimizing that amount of necessary flow Kind of writing transformers is scaling transformers is much easier than it was pre transformers Interesting. Could you talk about something because transformers if you think about gradient descent as like a crucial bit of like instead of writing all your learning

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner