AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Perimeter Flop Trade-Off Between DeCoter and Incoecoter Language Models
The paper tries to be a little more rigorous when we're comparing decoter only language models to an incoecoter model. We try to compare them not only in terms of the number of perameters, but also the number of flops that it takes to process a sequence. And one of the interesting findings exploring this perimeter flop trade off was that if you actually share the parometers in the akota and the dekota, the performance doesn't drop very much. So that's just one interesting tit bit. Another takeaway that was a little surprising to me was that across many different unsupervised objectives, we didn't find a huge difference in performance as long as you were