Focus on reverse engineering kernels to understand neural network inductive bias.
Optimization landscapes in neural networks may lack bad local minima in large-scale networks.
Eigen learning framework quantifies how kernels allocate 'learnability' to influence generalization capabilities.
Deep dives
Exploring Inductive Bias in Architectures
The podcast discusses the idea of reverse engineering kernels to understand the inductive bias of architectures in kernel space. Instead of focusing on specific theorems, the central message encourages designing neural networks from first principles in kernel space. The guest, Jamie Simon, a PhD student, shares his journey from studying physics to delving into machine learning, driven by the challenge of understanding neural networks' learning capabilities.
Optimization Landscapes of Neural Networks
The discussion delves into the complexity of optimization landscapes in neural networks, particularly focusing on the absence of bad local minima in large-scale networks. Inspired by statistical mechanics, the theory suggests that critical points in these landscapes are globally optimal rather than locally constrained. Through experiments, it was discovered that existing algorithms might converge to points that do not align with true critical points in neural networks.
Mode Connectivity and Infinite Width Networks
The conversation shifts towards exploring mode connectivity in neural networks, especially in infinite width networks. The exploration of mode connectivity leads to insights about kernel functions alignment with data, the predictability of network behavior, and how studying infinite width networks can provide a simpler analytical understanding. This understanding has implications for generalization of neural networks and potential applications in discovering new network architectures.
Understanding the Inductive Bias of Kernels
One key insight discussed in the podcast is the concept of eigenfunctions and eigenvalues in kernels. By considering the eigenfunctions and eigenvalues of a kernel matrix, it becomes apparent that kernels tend to prioritize learning functions with higher eigenvalues, which are often smoother and simpler. This understanding leads to the development of the eigen learning framework, which quantifies how kernels allocate 'learnability' to different eigen modes, ultimately influencing the generalization capabilities of the kernel on a given data distribution.
Utility of the Eigen Learning Framework
The podcast also delves into the practical applications of the eigen learning framework in machine learning. This analytical tool allows for the estimation of mean squared error and the smoothness of functions without needing to train the model. By understanding how kernels allocate learnability to eigen modes, researchers can gain insights into why certain kernels perform better, optimize hyperparameters efficiently, and potentially enhance the generalization abilities of machine learning models.
Jamie Simon is a 4th year Ph.D. student at UC Berkeley advised by Mike DeWeese, and also a Research Fellow with us at Generally Intelligent. He uses tools from theoretical physics to build fundamental understanding of deep neural networks so they can be designed from first-principles. In this episode, we discuss reverse engineering kernels, the conservation of learnability during training, infinite-width neural networks, and much more.
About Generally Intelligent
We started Generally Intelligent because we believe that software with human-level intelligence will have a transformative impact on the world. We’re dedicated to ensuring that that impact is a positive one.
We have enough funding to freely pursue our research goals over the next decade, and our backers include Y Combinator, researchers from OpenAI, Astera Institute, and a number of private individuals who care about effective altruism and scientific research.
Our research is focused on agents for digital environments (ex: browser, desktop, documents), using RL, large language models, and self supervised learning. We’re excited about opportunities to use simulated data, network architecture search, and good theoretical understanding of deep learning to make progress on these problems. We take a focused, engineering-driven approach to research.