
98 - Analyzing Information Flow In Transformers, With Elena Voita
NLP Highlights
00:00
Train From Scratch Modal Heads?
Based on these results, what do you think we can change about the way we build these models? Is there anything else you think we should do differently? I say several possible directions of research. Firsti haven't mentioned it before, but i've also looked at whether we can train from scratch modal with the same configuration of heads as the broomed ones. We fond that, no, we cannot. It's better brom mich morde than train from scratch. An en molt of the same size. And they can be i y connections towate etiet hypothesis.
Transcript
Play full episode