AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Intuition in Model Learning
With GPD3 we're talking about 175 billion parameters at least in the original version of GPD3. And I think it's such a shockingly large number of things being learned of essentially numbers in the model being learned. So you could argue that even if we were in the regime that the statistics people are used to for something like GPT-3, the situation wouldn't be too bad because the amount of data has been quite large compared to the parameters. Even when you have neural networks with far more parameters than labels, how can those generalize? This happens due to properties of the optimization algorithm of stochastic gradient descent.