NLP Highlights cover image

129 - Transformers and Hierarchical Structure, with Shunyu Yao

NLP Highlights

00:00

The Scaler Position Incodi, Is Important for Former Languages?

For all this, like a dike processing transformer, to work, we actually assume a very special kind of position n coding. We call this skiller positionincodi, or skiller pe. So the idea is that you have thi, separate fixed dimension in your tokin bedding that ranges from dero to one. And also, experimentally, wi show that with that kind of skiller position in coding, transformers can actually learn from finance samples generalize to a longer lens of a imput. But with more traditional position in codings, like a neneable inbedings, or, you know, like furier fisures, casin and sin

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app