NLP Highlights cover image

98 - Analyzing Information Flow In Transformers, With Elena Voita

NLP Highlights

00:00

The Difference Between Mamelem and Naxtal Caraters

A musk language model is trained to predict current, not an identity. And when trainin it so most of the time, musk tokan orandum torkand so it trained to first, like, accumulated information about context, and then terconstructr the strikan identity. In these experiments for muscing wich mode, we take representations like a in training time, which musket orreplaced, to get cases where input and output are different. What we see is actuallysitherare two processes going on, losing information about input, while on the same time, acumulate inforomation about art and since output a difference. So far left riten

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app