NLP Highlights cover image

129 - Transformers and Hierarchical Structure, with Shunyu Yao

NLP Highlights

00:00

Generalization to Longer Sequence Lents?

In our teoratic construction, it's a kind of automatic how you generalyze from smaller imput to longer input. Because that is the fundamental strains of distributed way of siquence processing. Whereas if you want to generalize from like, a smaller depth to a longer depth by fixing about the hard attention construction, it requires stacking up more layers. And for the soft attention in network, it acturally requires some details about how you represent the dest information. It really depends on what is the maximum depth tat you count on.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app