The Perimeter Flop Trade-Off Between DeCoter and Incoecoter Language Models

The paper tries to be a little more rigorous when we're comparing decoter only language models to an incoecoter model. We try to compare them not only in terms of the number of perameters, but also the number of flops that it takes to process a sequence. And one of the interesting findings exploring this perimeter flop trade off was that if you actually share the parometers in the akota and the dekota, the performance doesn't drop very much. So that's just one interesting tit bit. Another takeaway that was a little surprising to me was that across many different unsupervised objectives, we didn't find a huge difference in performance as long as you were

Play episode from 20:47

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app