The Anomaly of Language Models

When we giggle at one of these models, making a silly mistake, keep in mind that it's not doing the thing you're doing in day to day life. It's playing the token prediction game. Everything they know must be memorised and encoded in their fixed weights. To be fair, Gatto is only superhuman in some of those tasks. That's comforting, right? But if we ask a transformer to do 604 things at once, it's not too crazy. Phew! Oh, wait. The largest model they tested only had 0.21% as many parameters as the largest Palm model. And they do an absurdly good job anyway.

Play episode from 07:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app