Eye On A.I. cover image

Yann LeCun: Filling the Gap in Large Language Models

Eye On A.I.

CHAPTER

The History of Transformers

In the 90s, people were already working on things that we now call mixture of experts and also multiplicative interactions. Then there were ideas of neural networks that have a separate module for computation and memory. And then attention mechanism like this were popularized in around 2015 by a paper from the Yoshua Bengiros group at Mila. They are extremely powerful for doing things like translation language translation in NLP. That really started the craze on attention. So you come on all those ideas and you get a transformer that uses something called self attention where the input tokens are used both as queries and keys in a associative memory very much like a memory network. The advantage of transformers is

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner