AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is There a Theory for Convex Optimization?
When you have a model and you're trying to train it, should you be trying to optimize the hyperparameters or should you be adding more data? You can think of it in a sort of very crude sense. And we know that a lot of these large language models, like BERT, for example, are just not properly converged. There are a large number of layers that are simply undertrained.