The Inside View cover image

5. Charlie Snell on DALL-E and CLIP

The Inside View

CHAPTER

Scaling Loss in Language Models

The reason it works so well is just like you can scale transformers way up and they get better. So if you're trying to get precise from this picture, the loss decreases. Do you think, do you think the scaling law will last for all? Hold on. I'm not sure how to really grasp it. It's basically saying compute go up language model go burr.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner