AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Scaling Loss in Language Models
The reason it works so well is just like you can scale transformers way up and they get better. So if you're trying to get precise from this picture, the loss decreases. Do you think, do you think the scaling law will last for all? Hold on. I'm not sure how to really grasp it. It's basically saying compute go up language model go burr.