AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Parallel Relationship Between Model Loss and Cross Entropy
If you increase the compute by a factor of 10, you always see the same fractional decrease in the loss. How big that fractional decrease is depends on the domain. And so it's a parallel relationship, so it looks linear on a log-log plot,.