
129 - Transformers and Hierarchical Structure, with Shunyu Yao
NLP Highlights
00:00
The Scaler Position Incodi, Is Important for Former Languages?
For all this, like a dike processing transformer, to work, we actually assume a very special kind of position n coding. We call this skiller positionincodi, or skiller pe. So the idea is that you have thi, separate fixed dimension in your tokin bedding that ranges from dero to one. And also, experimentally, wi show that with that kind of skiller position in coding, transformers can actually learn from finance samples generalize to a longer lens of a imput. But with more traditional position in codings, like a neneable inbedings, or, you know, like furier fisures, casin and sin
Transcript
Play full episode