AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The History of Transformers
In the 90s, people were already working on things that we now call mixture of experts and also multiplicative interactions. Then there were ideas of neural networks that have a separate module for computation and memory. And then attention mechanism like this were popularized in around 2015 by a paper from the Yoshua Bengiros group at Mila. They are extremely powerful for doing things like translation language translation in NLP. That really started the craze on attention. So you come on all those ideas and you get a transformer that uses something called self attention where the input tokens are used both as queries and keys in a associative memory very much like a memory network. The advantage of transformers is