Is There a Theory for Convex Optimization?

When you have a model and you're trying to train it, should you be trying to optimize the hyperparameters or should you be adding more data? You can think of it in a sort of very crude sense. And we know that a lot of these large language models, like BERT, for example, are just not properly converged. There are a large number of layers that are simply undertrained.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app