With GPD3 we're talking about 175 billion parameters at least in the original version of GPD3. And I think it's such a shockingly large number of things being learned of essentially numbers in the model being learned. So you could argue that even if we were in the regime that the statistics people are used to for something like GPT-3, the situation wouldn't be too bad because the amount of data has been quite large compared to the parameters. Even when you have neural networks with far more parameters than labels, how can those generalize? This happens due to properties of the optimization algorithm of stochastic gradient descent.
Read the full transcript here.
Can machines actually be intelligent? What sorts of tasks are narrower or broader than we usually believe? GPT-3 was trained to do a "single" task: predicting the next word in a body of text; so why does it seem to understand so many things? What's the connection between prediction and comprehension? What breakthroughs happened in the last few years that made GPT-3 possible? Will academia be able to stay on the cutting edge of AI research? And if not, then what will its new role be? How can an AI memorize actual training data but also generalize well? Are there any conceptual reasons why we couldn't make AIs increasingly powerful by just scaling up data and computing power indefinitely? What are the broad categories of dangers posed by AIs?
Ilya Sutskever is Co-founder and Chief Scientist of OpenAI, which aims to build artificial general intelligence that benefits all of humanity. He leads research at OpenAI and is one of the architects behind the GPT models. Prior to OpenAI, Ilya was co-inventor of AlexNet and Sequence to Sequence Learning. He earned his Ph.D. in Computer Science from the University of Toronto. Follow him on Twitter at @ilyasut.
Staff
Music
Affiliates