4min chapter

Eye On A.I. cover image

Yann LeCun: Filling the Gap in Large Language Models

Eye On A.I.

CHAPTER

The History of Transformers

In the 90s, people were already working on things that we now call mixture of experts and also multiplicative interactions. Then there were ideas of neural networks that have a separate module for computation and memory. And then attention mechanism like this were popularized in around 2015 by a paper from the Yoshua Bengiros group at Mila. They are extremely powerful for doing things like translation language translation in NLP. That really started the craze on attention. So you come on all those ideas and you get a transformer that uses something called self attention where the input tokens are used both as queries and keys in a associative memory very much like a memory network. The advantage of transformers is

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode