Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Latest episodes

undefined
Jun 16, 2022 • 1h 8min

#77 - Vitaliy Chiley (Cerebras)

Vitaliy Chiley  is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware. [00:00:00] Housekeeping [00:01:08] Preamble [00:01:50] Vitaliy Chiley Introduction [00:03:11] Cerebrus architecture [00:08:12] Memory management and FLOP utilisation [00:18:01] Centralised vs decentralised compute architecture [00:21:12] Sparsity [00:23:47] Does Sparse NN imply Heterogeneous compute? [00:29:21] Cost of distributed memory stores? [00:31:01] Activation vs weight sparsity [00:37:52] What constitutes a dead weight to be pruned? [00:39:02] Is it still a saving if we have to choose between weight and activation sparsity? [00:41:02] Cerebras is a cool place to work [00:44:05] What is sparsity? Why do we need to start dense?  [00:46:36] Evolutionary algorithms on Cerebras? [00:47:57] How can we start sparse? Google RIGL [00:51:44] Inductive priors, why do we need them if we can start sparse? [00:56:02] Why anthropomorphise inductive priors? [01:02:13] Could Cerebras run a cyclic computational graph? [01:03:16] Are NNs locality sensitive hashing tables? References; Rigging the Lottery: Making All Tickets Winners [RIGL] https://arxiv.org/pdf/1911.11134.pdf [D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/  A Spline Theory of Deep Learning [Balestriero] https://proceedings.mlr.press/v80/balestriero18b.html 
undefined
Jun 9, 2022 • 58min

#76 - LUKAS BIEWALD (Weights and Biases CEO)

Check out Weights and Biases here!  https://wandb.me/MLST Lukas Biewald is an entrepreneur living in San Francisco. He was the founder and CEO of Figure Eight an Internet company that collects training data for machine learning.  In 2018, he founded Weights and Biases, a company that creates developer tools for machine learning. Recently WandB got a cash injection of 15 million dollars in its second funding round.  Lukas has a bachelors and masters in mathematics and computer science respectively from Stanford university.  He was a research student under the tutelage of the legendary Daphne Koller.  Lukas Biewald https://twitter.com/l2k [00:00:00] Preamble [00:01:27] Intro to Lukas [00:02:46] How did Lukas build 2 sucessful startups? [00:05:49] Rebalancing games with ML [00:08:14] Elevator pitch for WandB [00:10:38] Science vs Engineering divide in ML DevOps [00:14:11] Too much focus on the minutiae? [00:18:03] Vertical information sharing in large enterprises (metrics) [00:20:37] Centralised vs Decentralised topology [00:24:02] Generalisation vs specialisation [00:28:59] Enhancing explainability [00:33:14] Should we try and understand "the machine" or is testing / behaviourism enough? [00:36:55] WandB roadmap [00:39:06] WandB / ML Ops competitor space? [00:44:10] How is WandB differentiated over Sagemaker / AzureML [00:46:02] WandB Sponsorship of ML YT channels [00:48:43] Alternatives to deep learning? [00:53:47] How to build a business like WandB Panel: Tim Scarfe Ph.D and Keith Duggar Ph.D Note we didn't get paid by Weights and Biases to conduct this interview.
undefined
Apr 29, 2022 • 1h 55min

#75 - Emergence [Special Edition] with Dr. DANIELE GRATTAROLA

An emergent behavior or emergent property can appear when a number of simple entities operate in an environment, forming more complex behaviours as a collective. If emergence happens over disparate size scales, then the reason is usually a causal relation across different scales. Weak emergence describes new properties arising in systems as a result of the low-level interactions, these might be interactions between components of the system or the components and their environment.  In our epic introduction we focus a lot on the concept of self-organisation, complex systems, cellular automata and strong vs weak emergence. In the main show we discuss this more in detail with Dr. Daniele Grattarola and cover his recent NeurIPS paper on learning graph cellular automata.  YT version: https://youtu.be/MDt2e8XtUcA Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Featuring; Dr. Daniele Grattarola Dr. Tim Scarfe Dr. Keith Duggar Prof. David Chalmers Prof. Ken Stanley Prof. Julian Togelius Dr. Joscha Bach David Ha Dr. Pei Wang [00:00:00] Special Edition Intro: Emergence and Cellular Automata [00:49:02] Intro to Daniele and CAs [00:57:23] Numerical analysis link with CA (PDEs) [00:59:50] The representational dichotomy of discrete and continuous at different scales [01:05:21] Universal computation in CAs [01:10:27] Computational irreducibility  [01:16:33] Is the universe discrete? [01:20:49] Emergence but with the same computational principle [01:23:10] How do you formalise the emergent phenomenon  [01:25:44] Growing cellular automata [01:33:53] Openeded and unbounded computation is required for this kind of behaviour [01:37:31] Graph cellula automata [01:43:40] Connection to protein folding [01:46:24] Are CAs the best tool for the job? [01:49:37] Where to go to find more information
undefined
Apr 14, 2022 • 1h 6min

#74 Dr. ANDREW LAMPINEN - Symbolic behaviour in AI [UNPLUGGED]

Please note that in this interview Dr. Lampinen was expressing his personal opinions and they do not necessarily represent those of DeepMind.  Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT version: https://youtu.be/yPMtSXXn4OY  Dr. Andrew Lampinen is a Senior Research Scientist at DeepMind, and he thinks that symbols are subjective in the relativistic sense. Dr. Lampinen completed his PhD in Cognitive Psychology at Stanford University. His background is in mathematics, physics, and machine learning. Andrew has said that his research interests are in cognitive flexibililty and generalization, and how these abilities are enabled by factors like language, memory, and embodiment.  Andrew with his coauthors has just released a paper called symbolic behaviour in artificial intelligence. Andrew lead in the paper by saying the human ability to use symbols has yet to be replicated in machines. He thinks that one of the key areas to bridge the gap here is considering how symbol meaning is established, and he strongly believes it is the symbol users themselves who agree upon the symbol meaning, And that the use of symbols entails behaviours which coalesce agreements about their meaning. Which in plain English means that symbols are defined by behaviours rather than their content. [00:00:00] Intro to Andrew and Symbolic Behaviour paper [00:07:01] Semantics underpins the unreasonable effectiveness of symbols [00:12:56] The Depth of Subjectivity [00:21:03] Walid Saba - universal cognitive templates [00:27:47] Insufficiently Darwinian  [00:30:52] Discovered vs invented [00:34:19] Does language have primacy [00:35:59] Research directions [00:39:43] Comparison to BenG OpenCog and human compatible AI [00:42:53] Aligning AI with our culture [00:47:55] Do we need to model the worst aspects of human behaviour?  [00:50:57] Fairness [00:54:24] Memorisatation on LLMs [01:00:38] Wason selection task [01:03:45] Would an Andrew hashtable robot be intelligent? Dr. Andrew Lampinen https://lampinen.github.io/ https://twitter.com/AndrewLampinen Symbolic Behaviour in Artificial Intelligence https://arxiv.org/abs/2102.03406 Imitating Interactive Intelligence https://arxiv.org/abs/2012.05672 https://www.deepmind.com/publications/imitating-interactive-intelligence Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Yasaman Razeghi] https://arxiv.org/abs/2202.07206 Big bench dataset https://github.com/google/BIG-bench Teaching Autoregressive Language Models Complex Tasks By Demonstration [Recchia] https://arxiv.org/pdf/2109.02102.pdf Wason selection task https://en.wikipedia.org/wiki/Wason_selection_task Gary Lupyan https://psych.wisc.edu/staff/lupyan-gary/
undefined
Apr 7, 2022 • 56min

#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks

Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT version: https://youtu.be/RzGaI7vXrkk This week we speak with Yasaman Razeghi and Prof. Sameer Singh from UC Urvine. Yasaman recently published a paper called Impact of Pretraining Term Frequencies on Few-Shot Reasoning where she demonstrated comprehensively that large language models only perform well on reasoning tasks because they memorise the dataset. For the first time she showed the accuracy was linearly correlated to the occurance rate in the training corpus, something which OpenAI should have done in the first place!  We also speak with Sameer who has been a pioneering force in the area of machine learning interpretability for many years now, he created LIME with Marco Riberio and also had his hands all over the famous Checklist paper and many others.  We also get into the metric obsession in the NLP world and whether metrics are one of the principle reasons why we are failing to make any progress in NLU.  [00:00:00] Impact of Pretraining Term Frequencies on Few-Shot Reasoning [00:14:59] Metrics [00:18:55] Definition of reasoning [00:25:12] Metrics (again) [00:28:52] On true believers  [00:33:04] Sameers work on model explainability / LIME  [00:36:58] Computational irreducability  [00:41:07] ML DevOps and Checklist [00:45:58] Future of ML devops [00:49:34] Thinking about future Prof. Sameer Singh https://sameersingh.org/ Yasaman Razeghi https://yasamanrazeghi.com/ References; Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Razeghi et al with Singh] https://arxiv.org/pdf/2202.07206.pdf Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Riberio et al with Singh] https://arxiv.org/pdf/2005.04118.pdf “Why Should I Trust You?” Explaining the Predictions of Any Classifier (LIME) [Riberio et al with Singh] https://arxiv.org/abs/1602.04938 Tim interviewing LIME Creator Marco Ribeiro in 2019 https://www.youtube.com/watch?v=6aUU-Ob4a8I Tim video on LIME/SHAP on his other channel https://www.youtube.com/watch?v=jhopjN08lTM Our interview with Christoph Molar https://www.youtube.com/watch?v=0LIACHcxpHU Interpretable Machine Learning book @ChristophMolnar https://christophm.github.io/interpretable-ml-book/ Machine Teaching: A New Paradigm for Building Machine Learning Systems [Simard] https://arxiv.org/abs/1707.06742 Whimsical notes on machine teaching https://whimsical.com/machine-teaching-Ntke9EHHSR25yHnsypHnth Gopher paper (Deepmind) https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval https://arxiv.org/pdf/2112.11446.pdf EleutherAI https://www.eleuther.ai/ https://github.com/kingoflolz/mesh-transformer-jax/ https://pile.eleuther.ai/ A Theory of Universal Artificial Intelligence based on Algorithmic Complexity [Hutter] https://arxiv.org/pdf/cs/0004001.pdf
undefined
Mar 29, 2022 • 1h 25min

#72 Prof. KEN STANLEY 2.0 - On Art and Subjectivity [UNPLUGGED]

YT version: https://youtu.be/DxBZORM9F-8 Patreon: https://www.patreon.com/mlst  Discord: https://discord.gg/ESrGqhf5CB Prof. Ken Stanley argued in his book that our world has become saturated with objectives. The process of setting an objective, attempting to achieve it, and measuring progress along the way has become the primary route to achievement in our culture. He’s not saying that objectives are bad per se, especially if they’re modest, but he thinks that when goals are ambitious then the search space becomes deceptive. Is the key to artificial intelligence really related to intelligence? Does taking a job with a higher salary really bring you closer to being a millionaire? The problem is that the stepping stones which lead to ambitious objectives tend to be pretty strange, they don't resemble the final end state at all. Vaccum tubes led to computers for example and Youtube started as a dating website.  What fascinated us about this conversation with Ken is that we got a much deeper understanding of his philosophy. He lead by saying that he thought it's worth questioning whether artificial intelligence is even a science or not. Ken thinks that the secret to future progress is for us to embrace more subjectivity.  [00:00:00] Tim Intro [00:12:54] Intro [00:17:08] Seeing ideas everywhere - AI and art are highly connected [00:28:40] Creativity in Mathematics [00:30:14] Where is the intelligence in art? [00:38:49] Is AI disappointingly simple to mechanise? [00:42:48] Slightly conscious [00:46:27] Do we have subjective experience? [00:50:23] Fear of the unknown [00:51:48] Free Will [00:54:22] Chalmers [00:55:08] What's happening now in open-endedness [00:58:31] Generalisation [01:06:34] Representation primitives and what it means to understand [01:12:37] Appeal to definitions, knowledge itself blocks discovery Make sure you buy Kenneth's book! Why Greatness Cannot Be Planned: The Myth of the Objective [Stanley, Lehman] https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237 Abandoning Objectives: Evolution through the Search for Novelty Alone [Lehman, Stanley] https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehman_ecj11.pdf Twitter https://twitter.com/kenneth0stanley
undefined
Mar 25, 2022 • 1h 3min

#71 - ZAK JOST (Graph Neural Networks + Geometric DL) [UNPLUGGED]

Special discount link for Zak's GNN course - https://bit.ly/3uqmYVq Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT version: https://youtu.be/jAGIuobLp60 (there are lots of helper graphics there, recommended if poss) Want to sponsor MLST!? Let us know on Linkedin / Twitter.  [00:00:00] Preamble [00:03:12] Geometric deep learning [00:10:04] Message passing [00:20:42] Top down vs bottom up [00:24:59] All NN architectures are different forms of information diffusion processes (squashing and smoothing problem) [00:29:51] Graph rewiring [00:31:38] Back to information diffusion  [00:42:43] Transformers vs GNNs [00:47:10] Equivariant subgraph aggregation networks + WL test [00:55:36] Do equivariant layers aggregate too? [00:57:49] Zak's GNN course Exhaustive list of references on the YT show URL (https://youtu.be/jAGIuobLp60)
undefined
Mar 19, 2022 • 1h 19min

#70 - LETITIA PARCALABESCU - Symbolics, Linguistics [UNPLUGGED]

Today we are having a discussion with Letitia Parcalabescu from the AI Coffee Break youtube channel! We discuss linguistics, symbolic AI and our respective Youtube channels. Make sure you subscribe to her channel! In the first 15 minutes Tim dissects the recent article from Gary Marcus "Deep learning has hit a wall".  Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB YT: https://youtu.be/p2D2duT-R2E [00:00:00] Comments on Gary Marcus Article / Symbolic AI [00:14:57] Greetings [00:17:40] Introduction [00:18:48] A shared journey towards computation [00:22:10] A linguistics outsider [00:24:11] Is computational linguistics AI? [00:28:23] swinging pendulums of dogma and resource allocation [00:31:16] the road less travelled [00:34:35] pitching grants with multimodality ... and then the truth [00:40:50] some aspects of language are statistically learnable [00:44:58] ... and some aspects of language are dimensionally cursed [00:48:24] it's good to have both approaches to machine intelligence [00:51:14] the world runs on symbols [00:54:28] there is much more to learn biology [00:59:26] Letitia's creation process [01:02:23] don't overfit content, instead publish and iterate [01:07:48] merging the big picture arrow from the small direction arrows [01:11:02] use passion to drive through failure to success [01:12:56] stay positive [01:16:02] closing remarks
undefined
Mar 12, 2022 • 51min

#69 DR. THOMAS LUX - Interpolation of Sparse High-Dimensional Data

Today we are speaking with Dr. Thomas Lux, a research scientist at Meta in Silicon Valley.  In some sense, all of supervised machine learning can be framed through the lens of geometry. All training data exists as points in euclidean space, and we want to predict the value of a function at all those points. Neural networks appear to be the modus operandi these days for many domains of prediction. In that light; we might ask ourselves — what makes neural networks better than classical techniques like K nearest neighbour from a geometric perspective. Our guest today has done research on exactly that problem, trying to define error bounds for approximations in terms of directions, distances, and derivatives.   The insights from Thomas's work point at why neural networks are so good at problems which everything else fails at, like image recognition. The key is in their ability to ignore parts of the input space, do nonlinear dimension reduction, and concentrate their approximation power on important parts of the function.  [00:00:00] Intro to Show [00:04:11] Intro to Thomas (Main show kick off) [00:04:56] Interpolation of Sparse High-Dimensional Data [00:12:19] Where does one place the basis functions to partition the space, the perennial question [00:16:20] The sampling phenomenon -- where did all those dimensions come from? [00:17:40] The placement of the MLP basis functions, they are not where you think they are [00:23:15] NNs only extrapolate when given explicit priors to do so, CNNs in the translation domain [00:25:31] Transformers extrapolate in the permutation domain [00:28:26] NN priors work by creating space junk everywhere [00:36:44] Are vector spaces the way to go? On discrete problems [00:40:23] Activation functioms [00:45:57] What can we prove about NNs? Gradients without backprop Interpolation of Sparse High-Dimensional Data [Lux] https://tchlux.github.io/papers/tchlux-2020-NUMA.pdf A Spline Theory of Deep Learning [_Balestriero_] https://proceedings.mlr.press/v80/balestriero18b.html Gradients without Backpropagation ‘22 https://arxiv.org/pdf/2202.08587.pdf
undefined
Mar 7, 2022 • 1h 42min

#68 DR. WALID SABA 2.0 - Natural Language Understanding [UNPLUGGED]

Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/HNnAwSduud YT version: https://youtu.be/pMtk-iUaEuQ Dr. Walid Saba is an old-school polymath. He has a background in cognitive  psychology, linguistics, philosophy, computer science and logic and he’s is now a Senior Scientist at Sorcero. Walid is perhaps the most outspoken critic of BERTOLOGY, which is to say trying to solve the problem of natural language understanding with application of large statistical language models. Walid thinks this approach is cursed to failure because it’s analogous to memorising infinity with a large hashtable. Walid thinks that the various appeals to infinity by some deep learning researchers are risible. [00:00:00] MLST Housekeeping [00:08:03] Dr. Walid Saba Intro [00:11:56] AI Cannot Ignore Symbolic Logic, and Here’s Why [00:23:39] Main show - Proposition: Statistical learning doesn't work [01:04:44] Discovering a sorting algorithm bottom-up is hard [01:17:36] The axioms of nature (universal cognitive templates) [01:31:06] MLPs are locality sensitive hashing tables References; The Missing Text Phenomenon, Again: the case of Compound Nominals https://ontologik.medium.com/the-missing-text-phenomenon-again-the-case-of-compound-nominals-abb6ece3e205 A Spline Theory of Deep Networks https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf The Defeat of the Winograd Schema Challenge https://arxiv.org/pdf/2201.02387.pdf Impact of Pretraining Term Frequencies on Few-Shot Reasoning https://twitter.com/yasaman_razeghi/status/1495112604854882304?s=21 https://arxiv.org/abs/2202.07206 AI Cannot Ignore Symbolic Logic, and Here’s Why https://medium.com/ontologik/ai-cannot-ignore-symbolic-logic-and-heres-why-1f896713525b Learnability can be undecidable http://gtts.ehu.es/German/Docencia/1819/AC/extras/s42256-018-0002-3.pdf Scaling Language Models: Methods, Analysis & Insights from Training Gopher https://arxiv.org/pdf/2112.11446.pdf DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning https://arxiv.org/abs/2006.08381 On the Measure of Intelligence [Chollet] https://arxiv.org/abs/1911.01547 A Formal Theory of Commonsense Psychology: How People Think People Think https://www.amazon.co.uk/Formal-Theory-Commonsense-Psychology-People/dp/1107151007 Continuum hypothesis https://en.wikipedia.org/wiki/Continuum_hypothesis Gödel numbering + completness theorems https://en.wikipedia.org/wiki/G%C3%B6del_numbering https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems Concepts: Where Cognitive Science Went Wrong [Jerry A. Fodor] https://oxford.universitypressscholarship.com/view/10.1093/0198236360.001.0001/acprof-9780198236368

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode