Speaker 2
Yeah, that's a challenge. Actually, this leads me a little bit to one of the questions that I've been thinking to ask you. And that's related to the relationship between basically these old-school optimization methods like gradient descent and stochastic gradient descent, etc. And the biggest misconception in the computer science community, when it comes to using these tools, it seems to me that most practitioners in the machine learning community believe that red and descent solved all problems, which is actually a little bit what we discussed regarding non-convexity. What is your take on this big misconception? So this
Speaker 1
is something I think about almost every day. So if you look at the neural networks that we have today, chat GPT, stable diffusion, image generation models, now there are audio versions of diffusion generating audio like music. If you look at the voice cloning, it looks like gradient descent. Although it's very simple, it is solving a lot of problems and other problems that we deemed impossible. Some of the things we were not able to imagine like generating photorealistic images, this is something we had no idea about. Gradient descent is doing all of those. Is it going to solve all the problems? I don't think so. Well, let me give you an example. There is also a field of neural network research called physics-informed neural networks. And people use neural networks to solve differential equations, they model physical phenomena. And in those problems, typically you need to satisfy a constraint. So the neural network cannot be really arbitrary. Neural network needs to satisfy some hard constraint. For instance, the prediction at value zero must be zero or something like that. In the context of a differential equation, there are boundary conditions. So if you are reconstructing an electromagnetic field, it needs to obey some equations like Maxwell's equations as well as some boundary conditions. Maybe there's a conductor and then the field must be zero over that. Right. So doing this with neural networks using gradient descent is extremely challenging because gradient descent, all non-confixed functions, is hard to deal with if you have constraints. So there are some options to model constraints in gradient descent. One is reparameterization. And that is working poorly. Reparameterization means if you need a variable to be positive, you just square that variable and train it as a squared variable. But we know that to be messes up with the conditioning of the problem. So it's quite unnecessary to square a number if you don't really need to square. In classical optimization theory, we have some better ways like using barriers. So a barrier is something that will avoid a certain region by penalizing it. Like logarithm is a barrier for negative numbers. But again, this messes up with the conditioning and also for non-convex problems, this is not working properly as it's working in convex problems. So I would say gradient descent is not solving really physics, constraints, problems. And I think this is a good example. So another interesting example is from control. Right. If you are using neural networks to control a system like a robotic arm, there are again constraints, physical constraints, as well as human imposed safety constraints. And these safety constraints are again very hard to satisfy. And moreover, it's quite hard to verify that a system is safe if it is involving neural networks. It is because whenever you have a neural network controller, it does have a million different zones of operations. So the activations can be on or off. And it gives you a really big, big problem. Verifying the system, verifying the system is safe is intractable. So I think again, this is a problem where gradient descent is not