
5. Charlie Snell on DALL-E and CLIP
The Inside View
Scaling Loss in Language Models
The reason it works so well is just like you can scale transformers way up and they get better. So if you're trying to get precise from this picture, the loss decreases. Do you think, do you think the scaling law will last for all? Hold on. I'm not sure how to really grasp it. It's basically saying compute go up language model go burr.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.