

[23] Simon Du - Gradient Descent for Non-convex Problems in Modern Machine Learning
Apr 16, 2021
Simon Du, an Assistant Professor at the University of Washington, delves into the theoretical foundations of deep learning and gradient descent. He discusses the intricacies of addressing non-convex problems, revealing challenges and insights from his research. The conversation highlights the significance of the neural tangent kernel and its implications for optimization and generalization. Simon also shares practical tips for reading research papers, drawing connections between theory and practice, and navigating a successful research career.
AI Snips
Chapters
Transcript
Episode notes
Role of Theory
- Theory helps us understand why methods work (or don't).
- It also guides practical application and new method design.
Theory vs. Empiricism
- Theoretical understanding provides abstract and rigorous explanations for observed phenomena.
- Unlike empirical approaches, it generalizes across broader classes of data and methods.
Deep Learning Theory like Physics
- Deep learning theory resembles physics, starting with observations then developing rigorous explanations.
- It contrasts with classical ML theory, which designs algorithms for predefined problems.