
129 - Transformers and Hierarchical Structure, with Shunyu Yao
NLP Highlights
The Scaler Position Incodi, Is Important for Former Languages?
For all this, like a dike processing transformer, to work, we actually assume a very special kind of position n coding. We call this skiller positionincodi, or skiller pe. So the idea is that you have thi, separate fixed dimension in your tokin bedding that ranges from dero to one. And also, experimentally, wi show that with that kind of skiller position in coding, transformers can actually learn from finance samples generalize to a longer lens of a imput. But with more traditional position in codings, like a neneable inbedings, or, you know, like furier fisures, casin and sin
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.