AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Comparing Different Methods in Small Language Models
Exploring the scaling issues with PDIA and the evolution of language models in AI development, from engram models to RNNs, while discussing the challenges and benefits of creating small language models that grow with data and cover the entire Pareto frontier.
Episode 130
I spoke with David Pfau about:
* Spectral learning and ML
* Learning to disentangle manifolds and (projective) representation theory
* Deep learning for computational quantum mechanics
* Picking and pursuing research problems and directions
David’s work is really (times k for some very large value of k) interesting—I’ve been inspired to descend a number of rabbit holes because of it.
(if you listen to this episode, you might become as cool as this guy)
While I’m at it — I’m still hovering around 40 ratings on Apple Podcasts. It’d mean a lot if you’d consider helping me bump that up!
Enjoy—and let me know what you think!
David is a staff research scientist at Google DeepMind. He is also a visiting professor at Imperial College London in the Department of Physics, where he supervises work on applications of deep learning to computational quantum mechanics. His research interests span artificial intelligence, machine learning and scientific computing.
Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.
I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :) You can also support upkeep for the full Gradient team/project through a paid subscription on Substack!
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (00:52) David Pfau the “critic”
* (02:05) Scientific applications of deep learning — David’s interests
* (04:57) Brain / neural network analogies
* (09:40) Modern ML systems and theories of the brain
* (14:19) Desirable properties of theories
* (18:07) Spectral Inference Networks
* (19:15) Connections to FermiNet / computational physics, a series of papers
* (33:52) Deep slow feature analysis — interpretability and findings on eigenfunctions
* (39:07) Following up on eigenfunctions (there are indeed only so many hours in a day; I have been asking the Substack people if they can ship 40-hour days, but I don’t think they’ve gotten to it yet)
* (42:17) Power iteration and intuitions
* (45:23) Projective representation theory
* (46:00) ???
* (46:54) Geomancer and learning to decompose a manifold from data
* (47:45) we consider the question of whether you will spend 90 more minutes of this podcast episode (there are not 90 more minutes left in this podcast episode, but there could have been)
* (1:08:47) Learning embeddings
* (1:11:12) The “unexpected emergent property” of Geomancer
* (1:14:43) Learned embeddings and disentangling and preservation of topology
* n/b I still haven’t managed to do this in colab because I keep crashing my instance when I use s3o4d :(
* (1:21:07) What’s missing from the ~ current (deep learning) paradigm ~
* (1:29:04) LLMs as swiss-army knives
* (1:32:05) RL and human learning — TD learning in the brain
* (1:37:43) Models that cover the Pareto Front (image below)
* (1:46:54) AI accelerators and doubling down on transformers
* (1:48:27) On Slow Research — chasing big questions and what makes problems attractive
* (1:53:50) Future work on Geomancer
* (1:55:35) Finding balance in pursuing interesting and lucrative work
* (2:00:40) Outro
Links:
* Papers
* Natural Quantum Monte Carlo Computation of Excited States (2023)
* Making sense of raw input (2021)
* Integrable Nonparametric Flows (2020)
* Disentangling by Subspace Diffusion (2020)
* Ab initio solution of the many-electron Schrödinger equation with deep neural networks (2020)
* Spectral Inference Networks (2018)
* Connecting GANs and Actor-Critic Methods (2016)
* Learning Structure in Time Series for Neuroscience and Beyond (2015, dissertation)
* Robust learning of low-dimensional dynamics from large neural ensembles (2013)
* Probabilistic Deterministic Infinite Automata (2010)
* Other
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode