AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Evolution of the Decoder Only Transformer in Language Modeling
Great to hear yeah definitely when there's such a big team it's good to hear that it kind of all came together. Now speaking a bit about architecture and sort of results I was also wondering about some of the things you might do next or what you thought of doing as far as some of these empirical studies. So one thing I was thinking is for very large models maybe it's not usable at deployment times people would use like pruned modelsyou know or optimized models. Do these trends hold or do these pruned models inherently form worse?