AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Self-Attention in Neural Turing Machines
One person called the stack of mounted memory network. Another person called key value memory network and then a whole bunch of things./nThese use associative memories that are the basic modules that are used inside the transformers./nAttention mechanism like this were popularized in around 2015 by a paper from the Yoshua Bengiros group at Mila./nAnd demonstrated that they are extremely powerful for doing things like translation language translation in NLP./nThis started the craze on attention./nAnd so you come on all these ideas and you get a transformer that uses something called self attention, where the input tokens are used both as queries and keys in a associative memory very much like a memory network./nAnd then you view this as layer if you want you put several of those in the layer and then you stack those layers and that's what the transformer is.