The Inside View cover image

5. Charlie Snell on DALL-E and CLIP

The Inside View

00:00

Scaling Loss in Language Models

The reason it works so well is just like you can scale transformers way up and they get better. So if you're trying to get precise from this picture, the loss decreases. Do you think, do you think the scaling law will last for all? Hold on. I'm not sure how to really grasp it. It's basically saying compute go up language model go burr.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app