2min chapter

The Rhys Show cover image

Neuroscientist Konrad Kording reveals shocking truth about machine learning and the brain

The Rhys Show

CHAPTER

The Evolution of Transformers

The architecture of transformers makes that considerably easier to scale. I mean it's still a problem how to scale up really really big transformer Networks, but there was a time where it was exceptionally difficult to write Para write parallel systems. Transmitting information between graphics cards um, the architecture of transformer makes that considerably easy to scale. At fast there's constraints about the flow of information within the system that we have By kind of minimizing that amount of necessary flow Kind of writing transformers is scaling transformers is much easier than it was pre transformers Interesting. Could you talk about something because transformers if you think about gradient descent as like a crucial bit of like instead of writing all your learning

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode