AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Attention Is All You Need
The original transformer paper has the title attention is all you need. It's kind of the key aspect of what makes a transformer. What attention does is it uses that distribution, the probability of each of the five things and feeds that probability back into the model itself. So imagine I have a sentence like the man walked the dog and I want to predict the next word in that sentence. Those previous five words would be the five items I'd want to choose from. Attention would say how much weight should I give to each of those previous five words when trying to decide on the next word.