Data Skeptic cover image

The Limits of NLP

Data Skeptic

00:00

The Perimeter Flop Trade-Off Between DeCoter and Incoecoter Language Models

The paper tries to be a little more rigorous when we're comparing decoter only language models to an incoecoter model. We try to compare them not only in terms of the number of perameters, but also the number of flops that it takes to process a sequence. And one of the interesting findings exploring this perimeter flop trade off was that if you actually share the parometers in the akota and the dekota, the performance doesn't drop very much. So that's just one interesting tit bit. Another takeaway that was a little surprising to me was that across many different unsupervised objectives, we didn't find a huge difference in performance as long as you were

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app