In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.
Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.
Have suggestions for future podcast guests (or other feedback)? Let us know here!
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research
* (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets
* (14:30) Lottery tickets as lucky initialization
* (17:00) Types of masking and the “masking is training” claim
* (24:00) Type-0 masks and weight evolution over long training trajectories
* (27:00) Can you identify good masks or training trajectories a priori?
* (29:00) The role of signs in neural net initialization
* (35:27) The Supermask
* (41:00) Masks to probe pretrained models and model steerability
* (47:40) Fortuitous Forgetting in Connectionist Networks
* (54:00) Relationships to other work (double descent, grokking, etc.)
* (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives
* (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning
* (1:09:00) Learning + algorithmic reasoning, prompting strategy
* (1:13:50) What’s happening with in-context learning?
* (1:14:00) Induction heads
* (1:17:00) ICL and gradient descent
* (1:22:00) Algorithmic prompting vs discovery
* (1:24:45) Future directions for algorithmic prompting
* (1:26:30) Interesting work from NeurIPS 2022
* (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems
* (1:34:30) Hattie’s perspective on ML publishing culture
* (1:42:12) Outro
Links:
* Hattie’s homepage and Twitter
* Papers
* Deconstructing Lottery Tickets: Zeros, signs, and the Supermask
* Fortuitous Forgetting in Connectionist Networks
* Teaching Algorithmic Reasoning via In-context Learning
Get full access to The Gradient at
thegradientpub.substack.com/subscribe