Speaker 2
In a human-centric sense, it feels like GPT-3 hasn't learned anything that could be used to reason. But that might be just the early days.
Speaker 1
Yeah, I think that's correct. I think the forms of reasoning that you see perform are basically just reproducing patterns that it has seen in string data. So of course, if you're trained on the entire web, then you can produce an illusion of reasoning in many different situations. But it will break down if it's presented with a novel situation.
Speaker 2
That's the open question between
Speaker 1
the illusion of reasoning and actual reasoning, yeah. Yes, the power to adapt to something that is genuinely new. Because the thing is, even imagine you had, you could train on every bit of data ever generated in this sphere of humanity. It remains, that model would be capable of anticipating many different possible situations, but it remains that the future is going to be something different. Like, for instance, if you train a GPT-3 model on data from the year 2002, for instance, and then use it today, it's going to be missing many things. It's going to be missing many common sense facts about the world. It's even going
Speaker 2
to be missing vocabulary and so on. It's interesting that GPT-3 even doesn't have, I think, any information about the coronavirus. Yes.
Speaker 1
Which is why, you know, a system that's, you tell that the system is intelligent when it's capable to adapt. So intelligence is going to require some amount of continuous learning. It's also going to require some amount of improvisation. Like it's not enough to assume that what you're gonna be asked to do is something that you've seen before, or something that is a simple interpolation of things you've seen before. Yeah.
Speaker 1
fact, that model breaks down for even very tasks that look relatively simple from a distance, like L5 self-driving, for instance. Google had a paper a couple of years back showing that something like 30 million different road situations were actually completely insufficient to train a driving model. It wasn't even L2, right? And that's a lot of data. That's a lot more data than the 20 or 30 hours of driving that a human needs to learn to drive, given the knowledge they've already accumulated.
Speaker 2
Well, let me ask you on that topic. Elon Musk, Tesla Autopilot, one of the only companies I believe is really pushing for a learning-based approach. Are you skeptical that that kind of network can achieve level four?
Speaker 1
L4 is probably achievable. L5 is probably
Speaker 2
not. What's the distinction there? Is L5 is completely, you can just fall asleep. Yeah. L5 is basically human level. Well, driving, I have to be careful saying human level. Cause like, that's the most- Yeah, kinds of drivers. Yeah, that's the clearest example of cars will most likely be much safer than humans in many situations where humans fail. It's the vice versa question.
Speaker 1
I'll tell you, the thing is the amounts of training data you would need to anticipate for pretty much every possible situation you learn content or the world is such that it's not entirely unrealistic to think that at some point in the future we'll develop a system that's trying on enough data, especially provided that we can simulate a lot of that data. We don't necessarily need actual cars on the road for everything. But it's a massive effort. And it turns out you can create a system that's much more adaptative, that can generalize much better if you just add explicit models of the surroundings of the car. And if you use deep learning for what it's good at, which is to provide perceptive information. So in general, deep learning is a way to encode perception and a way to encode intuition. But it is not a good medium for any sort of explicit reasoning. And in AI systems today, strong generalization tends to come from explicit models tend to come from abstractions in the human mind that are encoded in program form by a human engineer, right? These are the abstractions you can actually generalize, not the sort of weak abstraction that is learned by a neural network.
Speaker 2
Yeah, And the question is how much reasoning, how much strong abstractions are required to solve particular tasks like driving. That's the question. Or human life, existence. How much strong abstractions does existence require? But more specifically on driving, that seems to be a coupled question about intelligence. It's like, how much intelligence, like how do you build an intelligent system? And the coupled problem, how hard is this problem? How much intelligence does this problem actually require? So we get to cheat right because we get to look at the problem like it's not like you get to close our eyes and completely new to driving we get to do what we do as human beings which is uh for the majority of our life before we ever learn quote unquote to drive we get to watch other cars and other people drive we get to be in cars we get to watch. We get to see movies about cars. We get to observe all that stuff. And that's similar to what neural networks are doing. It's getting a lot of data. And the question is, yeah, how many leaps
Speaker 1
of reasoning genius is required to be able to actually effectively drive? I think it's an example of driving. I mean, sure, you've seen a lot of cars in your life before you learned to drive. But let's say you've learned to drive in Silicon Valley and now you rent a car in Tokyo. Well, now everyone is driving on the other side of the road, and the signs are different, and the roads are more narrow and so on. So it's a very, very different environment. And a smart human, even an average human, should be able to just zero-shot it, to just be operational in this very different environment right away, despite having had no contact with the novel complexity that is contained in this environment, right? And that is novel complexity. It's not just interpolation over the situations that you've encountered previously, like learning to drive in
Speaker 2
would say the reason I ask is one of the most interesting tests of intelligence we have today actively, which is driving. In terms of having an impact on the world, like when do you think we'll pass that test of intelligence? So
Speaker 1
I don't think driving is that much of a test of intelligence, because again, there is no task for which skill at that task demonstrates intelligence unless it's a kind of meta task that involves acquiring new skills. So I think you can actually solve driving without having any real amount of intelligence. For instance, if you really did have infinite training data, you could just literally train an end-to deep learning model that does driving, provided infinite training data. The only problem with the whole idea is collecting a data set that's sufficiently comprehensive, that covers the very long tail of possible situations you might encounter. And it's really just a scale problem. So I think there's nothing fundamentally wrong with this plan, with this idea. It's just that it strikes me as a fairly inefficient thing to do because you run into this scaling issue with diminishing returns. Whereas if instead you took a more manual engineering approach where you use deep learning modules in combination with engineering an explicit model of the surrounding of the cars and you bridge the two in a clever way, your model will actually start generalizing much earlier and more effectively than the end-to deep learning model. So why would you not go with the more manual engineering-oriented approach? Even if you created that system, either the end-to deep learning model system that's infinite data or the slightly more human system. I don't think achieving L5 would demonstrate a general intelligence or intelligence of any generality at all. Again, the only possible test of generality in AI would be a test that looks at skill acquisition over unknown tasks. For instance, you could take your L5 driver and ask it to learn to pilot a commercial airplane, for instance. And then you would look at how much human involvement is required and how much training data is required for the system to learn to pilot an airplane. And that gives you a measure of how intelligent the system really is.
Speaker 2
Yeah. Well, I mean, that's a big leap. I get you. But I'm more interested as a problem. I would see, to me, driving is a black box that can generate novel situations at some rate, what people call edge cases. So it does have newness that keeps being, like we're confronted, let's say once a month. It
Speaker 1
is a very long tail.
Speaker 2
Yes. That doesn't mean you cannot solve it,
Speaker 1
just by training a statistical model on a lot of data. Huge amount of data. It's really a matter of scale. But I guess what I'm saying is,
Speaker 2
if you have a vehicle that achieves level five, it is going to be able to deal with new situations. Or, I mean, the data is so large that the rate of new situations is very low. Yes. That's not intelligence. So if we go back to your kind of definition of intelligence, it's the efficiency. With
Speaker 1
which you can adapt to new situations, to truly new situations, not situations you've seen before, right? Not situations that could be anticipated by your creators, by the creators of the system, but truly new situations. The efficiency with which you acquire new skills. If you require, if in order to pick up a new skill, you require a very extensive training data set of most possible situations that can occur in the practice of that skill, then the system is not intelligent. It is mostly just a lookup table. Yeah. Well... Likewise, if in order to acquire a skill, you need a human engineer to write down a bunch of rules that cover most or every possible situation. Likewise, the system is not intelligent. The system is merely the output artifact of a process that happens in the minds of the engineers that are creating it, right? It is encoding an abstraction that's produced by the human mind. And intelligence would actually be the process of autonomously producing this abstraction.
Speaker 1
Not like if you take an abstraction and you encode it on a piece of paper or in a computer program, the abstraction itself is not intelligent. What's intelligent is the agent that's capable of producing these abstractions, right?
Speaker 2
Yeah. It feels like there's a little bit of a gray area. Like, because you're basically saying that deep learning forms abstractions too. But those abstractions do not seem to be effective for generalizing far outside of the things that's already seen. But generalize a little bit.
Speaker 1
Yeah, absolutely. No, deep learning does generalize a little bit. Like, generalization is not a binary. It's more like a spectrum. Yeah. And
Speaker 2
there's a certain point, it's a gray area, but there's a certain point where there's an impressive degree of generalization that happens. No, like I guess exactly what you were saying is, uh, intelligence is how efficiently you're able to generalize far outside of the distribution of things you've seen already. Yes. So it's both like the distance of how far you can, like how new, how radically new something is and how efficiently you're able to deal with that. So you can think
Speaker 1
of intelligence as a measure of an information conversion ratio. Like imagine a space of possible situations and you've covered some of them. So you have some amount of information about your space of possible situations that's provided by the situations you already know. And that's, on the other hand, also provided by the prior knowledge that the system brings to the table, the prior knowledge that's embedded in the system. So the system starts with some information, right, about the problem, about the task. And it's about going from that information to a program, what we would call a skill program, a behavioral program that can cover a large area of possible situation space. And essentially the ratio between that area and the amount of information you start with is intelligence. So a very smart agent can make efficient users of very little information about a new problem and very little prior knowledge as well to cover a very large area of potential situations in that problem. Without knowing what these future new situations are going to be. So one of the other big things you talk about in the
Speaker 2
paper, we've talked about it a little bit already, but let's talk about it some more, is actual tests of intelligence. So if we look at human and machine intelligence, do you think tests of intelligence should be different for humans and machines or how we think about testing of intelligence? are these fundamentally the same kind of intelligences that we're after, and therefore the tests should be similar?
Speaker 1
So if your goal is to create AIs that are more human-like, then it would be super valuable, obviously, to have a test that's universal, that applies to both AIs and humans, so that you could establish a comparison between the two, that you could tell exactly how intelligent, in terms of human intelligence, a given system is. So that said, the constraints that apply to artificial intelligence and to human intelligence are very different. And your test should account for this difference. Because if you get artificial systems, it's always possible for an experimenter to buy arbitrary levels of skill at arbitrary tasks, either by injecting hard-coded prior knowledge into the system via rules and so on that come from the human mind, from the minds of the programmers, and also buying higher levels of skill just by training on more data. For instance, you could generate an infinity of different Go games and you could train a Go playing system that way, but you could not directly compare it to human Go playing skills because a human that plays Go had to develop that skill in a very constrained environment. They had a limited amount of time, they had a limited amount of energy. And of course, this started from a different set of priors. This started from innate human priors. So I think if you want to compare the intelligence of two systems, like the intelligence of an AI and the intelligence of a human, you have to control for priors. You have to start from the same set of knowledge, priors about the task, and you have to control for experience, that is to say for training data.
Speaker 1
priors? So prior is whatever information you have about a given task before you start learning about this task. And how's that different from experience? Well, experience is acquired, right? So for instance, if you're trying to play Go, your experience with Go is all the Go games you've played or you've seen or you've simulated in your mind, let's say. And your priors are things like, well, Go is a game on a 2D grid, and we have lots of hard-coded priors about the organization of 2D space. And
Speaker 2
so rules of how the dynamics of the physics of this game in this 2D space. Yes. And the idea that you have what winning is.
Speaker 1
Yes, exactly. And other board games can also share some similarities with Go. And if you've played these board games, then with respect to the game of Go, that would be part of your priors about the game.
Speaker 2
Well, it's interesting to think about the game of Go is how many priors are actually brought to the table. When you look at self-play, reinforcement learning-based mechanisms that do learning, it seems like the number of priors is pretty low. Yes. But you're saying you should be... There is a 2D special
Speaker 1
prior in the column. Right. But you should be clear at making those priors explicit. Yes. So in particular, I think if your goal is to measure a human-like form of intelligence, then you should clearly establish that you want the AI you're testing to start from the same set of priors that humans start with.
Speaker 2
Right. So, I mean, to me personally, but I think to a lot of people, the human side of things is very interesting. So testing intelligence for humans, what do you think is a good test of human intelligence? Well,
Speaker 1
that's the question that psychometrics is interested in. There's an entire subfield of psychology that deals with this question.
Speaker 2
So what's psychometrics? Psychometrics
Speaker 1
is the subfield of psychology that tries to measure, quantify aspects of the human mind. So in particular, our community abilities, intelligence, and personality traits as well.