
98 - Analyzing Information Flow In Transformers, With Elena Voita
NLP Highlights
00:00
The Importance of Attention in the Model
So we can prune like two third of all heads as all specialized functions being alive. So basically all functions are aliver until we have the seven heads. And then if ye push forward a heads, start ratin several functions, for example, a toros and pactic functions. That's really interesting. Did you notice that heds that actually survive the pruning process performing?
Transcript
Play full episode