The Gradient: Perspectives on AI

Daniel Bashir

Deeply researched, technical interviews with experts thinking about AI and technology. thegradientpub.substack.com

Episodes

Mentioned books

Mar 30, 2023 • 1h 8min

Soumith Chintala: PyTorch

In episode 66 of The Gradient Podcast, Daniel Bashir speaks to Soumith Chintala.Soumith is a Research Engineer at Meta AI Research in NYC. He is the co-creator and lead of Pytorch, and maintains a number of other open-source ML projects including Torch-7 and EBLearn. Soumith has previously worked on robotics, object and human detection, generative modeling, AI for video games, and ML systems research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:30) Soumith’s intro to AI journey to Pytorch* (05:00) State of computer vision early in Soumith’s career* (09:15) Institutional inertia and sunk costs in academia, identifying fads* (12:45) How Soumith started working on GANs, frustrations* (17:45) State of ML frameworks early in the deep learning era, differentiators* (23:50) Frameworks and leveling the playing field, exceptions* (25:00) Contributing to Torch and evolution into Pytorch* (29:15) Soumith’s product vision for ML frameworks* (32:30) From product vision to concrete features in Pytorch* (39:15) Progressive disclosure of complexity (Chollet) in Pytorch* (41:35) Building an open source community* (43:25) The different players in today’s ML framework ecosystem* (49:35) ML frameworks pioneered by Yann LeCun and Léon Bottou, their influences on Pytorch* (54:37) Pytorch 2.0 and looking to the future* (58:00) Soumith’s adventures in household robotics* (1:03:25) Advice for aspiring ML practitioners* (1:07:10) Be cool like Soumith and subscribe :)* (1:07:33) OutroLinks:* Soumith’s Twitter and homepage* Papers* Convolutional Neural Networks Applied to House Numbers Digit Classification* GANs: LAPGAN, DCGAN, Wasserstein GAN* Automatic differentiation in PyTorch* PyTorch: An Imperative Style, High-Performance Deep Learning Library Get full access to The Gradient at thegradientpub.substack.com/subscribe

Mar 23, 2023 • 1h 43min

Sewon Min: The Science of Natural Language

In episode 65 of The Gradient Podcast, Daniel Bashir speaks to Sewon Min.Sewon is a fifth-year PhD student in the NLP group at the University of Washington, advised by Hannaneh Hajishirzi and Luke Zettlemoyer. She is a part-time visiting researcher at Meta AI and a recipient of the JP Morgan PhD Fellowship. She has previously spent time at Google Research and Salesforce research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (03:00) Origin Story* (04:20) Evolution of Sewon’s interests, question-answering and practical NLP* (07:00) Methodology concerns about benchmarks* (07:30) Multi-hop reading comprehension* (09:30) Do multi-hop QA benchmarks actually measure multi-hop reasoning?* (12:00) How models can “cheat” multi-hop benchmarks* (13:15) Explicit compositionality* (16:05) Commonsense reasoning and background information* (17:30) On constructing good benchmarks* (18:40) AmbigQA and ambiguity* (22:20) Types of ambiguity* (24:20) Practical possibilities for models that can handle ambiguity* (25:45) FaVIQ and fact-checking benchmarks* (28:45) External knowledge* (29:45) Fact verification and “complete understanding of evidence”* (31:30) Do models do what we expect/intuit in reading comprehension?* (34:40) Applications for fact-checking systems* (36:40) Intro to in-context learning (ICL)* (38:55) Example of an ICL demonstration* (40:45) Rethinking the Role of Demonstrations and what matters for successful ICL* (43:00) Evidence for a Bayesian inference perspective on ICL* (45:00) ICL + gradient descent and what it means to “learn”* (47:00) MetaICL and efficient ICL* (49:30) Distance between tasks and MetaICL task transfer* (53:00) Compositional tasks for language models, compositional generalization* (55:00) The number and diversity of meta-training tasks* (58:30) MetaICL and Bayesian inference* (1:00:30) Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations* (1:02:00) The copying effect* (1:03:30) Copying effect for non-identical examples* (1:06:00) More thoughts on ICL* (1:08:00) Understanding Chain-of-Thought Prompting* (1:11:30) Bayes strikes again* (1:12:30) Intro to Sewon’s text retrieval research* (1:15:30) Dense Passage Retrieval (DPR)* (1:18:40) Similarity in QA and retrieval* (1:20:00) Improvements for DPR* (1:21:50) Nonparametric Masked Language Modeling (NPM)* (1:24:30) Difficulties in training NPM and solutions* (1:26:45) Follow-on work* (1:29:00) Important fundamental limitations of language models* (1:31:30) Sewon’s experience doing a PhD* (1:34:00) Research challenges suited for academics* (1:35:00) Joys and difficulties of the PhD* (1:36:30) Sewon’s advice for aspiring PhDs* (1:38:30) Incentives in academia, production of knowledge* (1:41:50) OutroLinks:* Sewon’s homepage and Twitter* Papers* Solving and re-thinking benchmarks* Multi-hop Reading Comprehension through Question Decomposition and Rescoring / Compositional Questions Do Not Necessitate Multi-hop Reasoning* AmbigQA: Answering Ambiguous Open-domain Questions* FaVIQ: FAct Verification from Information-seeking Questions* Language Modeling* Rethinking the Role of Demonstrations* MetaICL: Learning to Learn In Context* Towards Understanding CoT Prompting* Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations* Text representation/retrieval* Dense Passage Retrieval* Nonparametric Masked Language Modeling Get full access to The Gradient at thegradientpub.substack.com/subscribe

Mar 16, 2023 • 1h 38min

Richard Socher: Re-Imagining Search

In episode 64 of The Gradient Podcast, Daniel Bashir speaks to Richard Socher.Richard is founder and CEO of you.com, a new search engine that lets you personalize your search workflow and eschews tracking and invasive ads. Richard was previously Chief Scientist at Salesforce where he led work on fundamental and applied research, product incubation, CRM search, customer service automation and a cross-product AI platform. He was an adjunct professor at Stanford’s CS department as well as founder and CEO/CTO of MetaMind, which was acquired by Salesforce in 2016. He received his PhD from Stanford’s CS Department in 2014.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:20) Richard Socher origin story + time at Metamind, Salesforce (AI Economist, CTRL, ProGen)* (22:00) Why Richard advocated for deep learning in NLP* (27:00) Richard’s perspective on language* (32:20) Is physical grounding and language necessary for intelligence?* (40:10) Frankfurtian b******t and language model utterances as truth* (47:00) Lessons from Salesforce Research* (53:00) Balancing fundamental research with product focus* (57:30) The AI Economist + how should policymakers account for limitations?* (1:04:50) you.com, the chatbot wars, and taking on search giants* (1:13:50) Re-imagining the vision for and components of a search engine* (1:18:00) The future of generative models in search and the internet* (1:28:30) Richard’s advice for early-career technologists* (1:37:00) OutroLinks:* Richard’s Twitter * YouChat by you.com* Careers at you.com* Papers mentioned* Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions* Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank* Grounded Compositional Semantics for Finding and Describing Images with Sentences* The AI Economist* ProGen* CTRL Get full access to The Gradient at thegradientpub.substack.com/subscribe

Mar 9, 2023 • 1h 6min

Joe Edelman: Meaning-Aligned AI

In episode 63 of The Gradient Podcast, Daniel Bashir speaks to Joe Edelman.Joe developed the meaning-based organizational metrics at Couchsurfing.com, then co-founded the Center for Humane Technology with Tristan Harris, and coined the term “Time Well Spent” for a family of metrics adopted by teams at Facebook, Google, and Apple. Since then, he's worked on the philosophical underpinnings for new business metrics, design methods, and political movements. The central idea is to make people's sources of meaning explicit, so that how meaningful or meaningless things are can be rigorously accounted for. His previous career was in HCI and programming language design.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro (yes Daniel is trying a new intro format)* (01:30) Joe’s origin story* (07:15) Revealed preferences and personal meaning, recommender systems* (12:30) Is using revealed preferences necessary? * (17:00) What are values and how do you detect them? * (24:00) Figuring out what’s meaningful to us* (28:45) The decline of spaces and togetherness* (35:00) Individualism and economic/political theory, tensions between collectivism/individualism* (41:00) What it looks like to build spaces, Habitat* (47:15) Cognitive effects of social platforms* (51:45) Atomized communication, re-imagining chat apps* (55:50) Systems for social groups and medium independence* (1:02:45) Spaces being built today* (1:05:15) Joe is building research groups! Get in touch :)* (1:05:40) OutroLinks:* Joe's 80m lecture on techniques for rebuilding society on meaning (youtube, transcript)* The discord for Rebuilding Meaning—join if you'd like to help build ML models or metrics using the methods discussed* Writing/papers mentioned:* Tech products (that don’t cause depression and war)* Values, Preferences, Meaningful Choice* Social Programming Considered as a Habitat for Groups* Is Anything Worth Maximizing* Joe’s homepage, Twitter, and YouTube page Get full access to The Gradient at thegradientpub.substack.com/subscribe

Mar 2, 2023 • 1h 14min

Ed Grefenstette: Language, Semantics, Cohere

In episode 62 of The Gradient Podcast, Daniel Bashir speaks to Ed Grefenstette.Ed is Head of Machine Learning at Cohere and an Honorary Professor at University College London. He previously held research scientist positions at Facebook AI Research and DeepMind, following a stint as co-founder and CTO of Dark Blue Labs. Before his time in industry, Ed worked at Oxford’s Department of Computer Science as a lecturer and Fulford Junior Research Fellow at Somerville College. Ed also received his MSc and DPhil from Oxford’s Computer Science Department.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:18) The Ed Grefenstette Origin Story* (08:15) Distributional semantics and Ed’s PhD research* (14:30) Extending the distributional hypothesis, later Wittgenstein* (18:00) Recovering parse trees in LMs, can LLMs understand communication and not just bare language?* (23:15) LMs capture something about pragmatics, proxies for grounding and pragmatics* (25:00) Human-in-the-loop training and RLHF—what is the essential differentiator? * (28:15) A convolutional neural network for modeling sentences, relationship to attention* (34:20) Difficulty of constructing supervised learning datasets, benchmark-driven development* (40:00) Learning to Transduce with Unbounded Memory, Neural Turing Machines* (47:40) If RNNs are like finite state machines, where are transformers? * (51:40) Cohere and why Ed joined* (56:30) Commercial applications of LLMs and Cohere’s product* (59:00) Ed’s reply to stochastic parrots and thoughts on consciousness* (1:03:30) Lessons learned about doing effective science* (1:05:00) Where does scaling end? * (1:07:00) Why Cohere is an exciting place to do science* (1:08:00) Ed’s advice for aspiring ML {researchers, engineers, etc} and the role of communities in science* (1:11:45) Cohere for AI plug!* (1:13:30) OutroLinks:* Ed’s homepage and Twitter* (some of) Ed’s Papers* Experimental support for a categorical compositional distributional model of meaning* Multi-step regression learning* “Not not bad” is not “bad”* Towards a formal distributional semantics* A CNN for modeling sentences* Teaching machines to read and comprehend* Reasoning about entailment with neural attention* Learning to Transduce with Unbounded Memory* Teaching Artificial Agents to Understand Language by Modelling Reward* Other things mentioned* Large language models are not zero-shot communicators (Laura Ruis + others and Ed)* Looped Transformers as Programmable Computers and our Update 43 covering this paper* Cohere and Cohere for AI (+ earlier episode w/ Sara Hooker on C4AI)* David Chalmers interview on AI + consciousness Get full access to The Gradient at thegradientpub.substack.com/subscribe

Feb 23, 2023 • 2h 3min

Ken Liu: What Science Fiction Can Teach Us

In episode 61 of The Gradient Podcast, Daniel Bashir speaks to Ken Liu.Ken is an author of speculative fiction. A winner of the Nebula, Hugo, and World Fantasy awards, he is the author of silkpunk epic fantasy series Dandelion Dynasty and short story collections The Paper Menagerie and Other Stories and The Hidden Girl and Other Stories. Prior to writing full-time, Ken worked as a software engineer, corporate lawyer, and litigation consultant.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:00) How Ken Liu became Ken Liu: A Saga* (03:10) Time in the tech industry, interest in symbolic machines* (04:40) Determining what stories to write, (07:00) art as failed communication* (07:55) Law as creating abstract machines, importance of successful communication, stories in law* (13:45) Misconceptions about science fiction* (18:30) How we’ve been misinformed about literature and stories in school, stories as expressing multivalent truths, Dickens on narration (29:00)* (31:20) Stories as imposing structure on the world* (35:25) Silkpunk as aesthetic and writing approach* (39:30) If modernity is a translated experience, what is it translated from? Alternative sources for the American pageant* (47:30) The value of silkpunk for technologists and building the future* (52:40) The engineer as poet* (59:00) Technology language as constructing societies, what it is to be a technologist* (1:04:00) The technology of language* (1:06:10) The Google Wordcraft Workshop and co-writing with LaMDA* (1:14:10) Possibilities and limitations of LMs in creative writing* (1:18:45) Ken’s short fiction* (1:19:30) Short fiction as a medium* (1:24:45) “The Perfect Match” (from The Paper Menagerie and other stories)* (1:34:00) Possibilities for better recommender systems* (1:39:35) “Real Artists” (from The Hidden Girl and other stories)* (1:47:00) The scaling hypothesis and creativity* (1:50:25) “The Gods have not died in vain” & Moore’s Proof epigraph (The Hidden Girl)* (1:53:10) More of The Singularity Trilogy (The Hidden Girl)* (1:58:00) The role of science fiction today and how technologists should engage with stories* (2:01:53) OutroLinks:* Ken’s homepage* The Dandelion Dynasty Series: Speaking Bones is out in paperback* Books/Stories/Projects Mentioned* “Evaluative Soliloquies” in Google Wordcraft* The Paper Menagerie and Other Stories* The Hidden Girl and Other Stories Get full access to The Gradient at thegradientpub.substack.com/subscribe

Feb 16, 2023 • 1h 43min

Hattie Zhou: Lottery Tickets and Algorithmic Reasoning in LLMs

In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research* (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets* (14:30) Lottery tickets as lucky initialization* (17:00) Types of masking and the “masking is training” claim* (24:00) Type-0 masks and weight evolution over long training trajectories* (27:00) Can you identify good masks or training trajectories a priori?* (29:00) The role of signs in neural net initialization* (35:27) The Supermask* (41:00) Masks to probe pretrained models and model steerability* (47:40) Fortuitous Forgetting in Connectionist Networks* (54:00) Relationships to other work (double descent, grokking, etc.)* (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives* (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning* (1:09:00) Learning + algorithmic reasoning, prompting strategy* (1:13:50) What’s happening with in-context learning?* (1:14:00) Induction heads* (1:17:00) ICL and gradient descent* (1:22:00) Algorithmic prompting vs discovery* (1:24:45) Future directions for algorithmic prompting* (1:26:30) Interesting work from NeurIPS 2022* (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems* (1:34:30) Hattie’s perspective on ML publishing culture* (1:42:12) OutroLinks:* Hattie’s homepage and Twitter* Papers* Deconstructing Lottery Tickets: Zeros, signs, and the Supermask* Fortuitous Forgetting in Connectionist Networks* Teaching Algorithmic Reasoning via In-context Learning Get full access to The Gradient at thegradientpub.substack.com/subscribe

Feb 9, 2023 • 2h 8min

Kyunghyun Cho: Neural Machine Translation, Language, and Doing Good Science

In episode 59 of The Gradient Podcast, Daniel Bashir speaks to Professor Kyunghyun Cho.Professor Cho is an associate professor of computer science and data science at New York University and CIFAR Fellow of Learning in Machines & Brains. He is also a senior director of frontier research at the Prescient Design team within Genentech Research & Early Development. He was a research scientist at Facebook AI Research from 2017-2020 and a postdoctoral fellow at University of Montreal under the supervision of Prof. Yoshua Bengio after receiving his MSc and PhD degrees from Aalto University. He received the Samsung Ho-Am Prize in Engineering in 2021.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:15) How Professor Cho got into AI, going to Finland for a PhD* (06:30) Accidental and non-accidental parts of Prof Cho’s journey, the role of timing in career trajectories* (09:30) Prof Cho’s M.Sc. thesis on Restricted Boltzmann Machines* (17:00) The state of autodiff at the time* (20:00) Finding non-mainstream problems and examining limitations of mainstream approaches, anti-dogmatism, Yoshua Bengio appreciation* (24:30) Detaching identity from work, scientific training* (26:30) The rest of Prof Cho’s PhD, the first ICLR conference, working in Yoshua Bengio’s lab* (34:00) Prof Cho’s isolation during his PhD and its impact on his work—transcending insecurity and working on unsexy problems* (41:30) The importance of identifying important problems and developing an independent research program, ceiling on the number of important research problems* (46:00) Working on Neural Machine Translation, Jointly Learning to Align and Translate* (1:01:45) What RNNs and earlier NN architectures can still teach us, why transformers were successful* (1:08:00) Science progresses gradually* (1:09:00) Learning distributed representations of sentences, extending the distributional hypothesis* (1:21:00) Difficulty and limitations in evaluation—directions of dynamic benchmarks, trainable evaluation metrics* (1:29:30) Mixout and AdapterFusion: fine-tuning and intervening on pre-trained models, pre-training as initialization, destructive interference* (1:39:00) Analyzing neural networks as reading tea leaves* (1:44:45) Importance of healthy skepticism for scientists* (1:45:30) Language-guided policies and grounding, vision-language navigation* (1:55:30) Prof Cho’s reflections on 2022* (2:00:00) Obligatory ChatGPT content* (2:04:50) Finding balance* (2:07:15) OutroLinks:* Professor Cho’s homepage and Twitter* Papers* M.Sc. thesis and PhD thesis* NMT and attention* Properties of NMT, * Learning Phrase Representations* Neural machine translation by jointly learning to align and translate * More recent work* Learning Distributed Representations of Sentences from Unlabelled Data* Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models* Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes’ Rule* AdapterFusion: Non-Destructive Task Composition for Transfer Learning Get full access to The Gradient at thegradientpub.substack.com/subscribe

Feb 2, 2023 • 1h 10min

Steve Miller: Will AI Take Your Job? It's Not So Simple.

In episode 58 of The Gradient Podcast, Daniel Bashir speaks to Professor Steve Miller.Steve is a Professor Emeritus of Information Systems at Singapore Management University. Steve served as Founding Dean for the SMU School of Information Systems, and established and developed the technology core of SIS research and project capabilities in Cybersecurity, Data Management & Analytics, Intelligent Systems & Decision Analytics, and Software & Cyber-Physical Systems, as well as the management science oriented capability in Information Systems & Management. Steve works closely with a number of Singapore government ministries and agencies via steering committees, advisory boards, and advisory appointments. Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (02:40) Steve’s evolution of interests in AI, time in academia and industry* (05:15) How different is this “industrial revolution”?* (10:00) What new technologies enable, the human role in technology’s impact on jobs* (11:35) Automation and augmentation and the realities of integrating new technologies in the workplace* (21:50) Difficulties of applying AI systems in real-world contexts* (32:45) Re-calibrating human work with intelligent machines* (39:00) Steve’s thinking on the nature of human/machine intelligence, implications for human/machine hybrid work* (47:00) Tradeoffs in using ML systems for automation/augmentation* (52:40) Organizational adoption of AI and speed* (1:01:55) Technology adoption is more than just a technology problem* (1:04:50) Progress narratives, “safe to speed”* (1:10:27) OutroLinks:* Steve’s SMU Faculty Profile and Google Scholar* Working with AI by Steve Miller and Tom Davenport Get full access to The Gradient at thegradientpub.substack.com/subscribe

Jan 26, 2023 • 58min

Blair Attard-Frost: Canada’s AI strategy and the ethics of AI business practices

In episode 57 of The Gradient Podcast, Andrey Kurenkov speaks to Blair Attard-Frost.Note: this interview was recorded 8 months ago, and some aspects of Canada’s AI strategy have changed since then. It is still a good overview of AI governance and other topics, however.Blair is a PhD Candidate at the University of Toronto’s Faculty of Information who researches the governance and management of artificial intelligence. More specifically, they are interested in the social construction of intelligence, unintelligence, and artificial intelligence, the relationship between organizational values and AI use, and the political economy, governance, and ethics of AI value chains. They integrate perspectives from service sciences, cognitive sciences, public policy, information management, and queer studies for their research.Have suggestions for future podcast guests (or other feedback)? Let us know here!Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter or MastodonOutline:* Intro* Getting into AI research* What is AI governance* Canada’s AI strategy* Other interestsLinks:* Once a promising leader, Canada’s artificial-intelligence strategy is now a fragmented laggard* The Ethics of AI Business Practices: A Review of 47 Guidelines Get full access to The Gradient at thegradientpub.substack.com/subscribe

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner