The Scaling Laws of Machine Learning

A paper published by OpenMind. basically talks about the scaling laws, which are all referring to very similar architecture. So a model like OPT or even GPT-3 was actually under trained for its size. You can take an actually a smaller model with less parameters, train it on more data and it will perform just as well or even better than a bigger model training on less data.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app