Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Latest episodes

undefined
Mar 14, 2021 • 1h 40min

047 Interpretable Machine Learning - Christoph Molnar

Christoph Molnar is one of the main people to know in the space of interpretable ML. In 2018 he released the first version of his incredible online book, interpretable machine learning. Interpretability is often a deciding factor when a machine learning (ML) model is used in a product, a decision process, or in research. Interpretability methods can be used to discover knowledge, to debug or justify the model and its predictions, and to control and improve the model, reason about potential bias in models as well as increase the social acceptance of models. But Interpretability methods can also be quite esoteric, add an additional layer of complexity and potential pitfalls and requires expert knowledge to understand. Is it even possible to understand complex models or even humans for that matter in any  meaningful way?  Introduction to IML [00:00:00] Show Kickoff [00:13:28] What makes a good explanation? [00:15:51] Quantification of how good an explanation is [00:19:59] Knowledge of the pitfalls of IML [00:22:14] Are linear models even interpretable? [00:24:26] Complex Math models to explain Complex Math models? [00:27:04] Saliency maps are glorified edge detectors [00:28:35] Challenge on IML -- feature dependence [00:36:46] Don't leap to using a complex model! Surrogate models can be too dumb [00:40:52] On airplane pilots. Seeking to understand vs testing [00:44:09] IML Could help us make better models or lead a better life [00:51:53] Lack of statistical rigor and quantification of uncertainty [00:55:35] On Causality [01:01:09] Broadening out the discussion to the process or institutional level [01:08:53] No focus on fairness / ethics? [01:11:44] Is it possible to condition ML model training on IML metrics ? [01:15:27] Where is IML going? Some of the esoterica of the IML methods [01:18:35] You can't compress information without common knowledge, the latter becomes the bottleneck [01:23:25] IML methods used non-interactively? Making IML an engineering discipline [01:31:10] Tim Postscript -- on the lack of effective corporate operating models for IML, security, engineering and ethics [01:36:34] Explanation in Artificial Intelligence: Insights from the Social Sciences (Tim Miller 2018) https://arxiv.org/pdf/1706.07269.pdf Seven Myths in Machine Learning Research (Chang 19)  Myth 7: Saliency maps are robust ways to interpret neural networks https://arxiv.org/pdf/1902.06789.pdf Sanity Checks for Saliency Maps (Adebayo 2020) https://arxiv.org/pdf/1810.03292.pdf Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/ Christoph Molnar: https://www.linkedin.com/in/christoph-molnar-63777189/ https://machine-master.blogspot.com/ https://twitter.com/ChristophMolnar Please show your appreciation and buy Christoph's book here; https://www.lulu.com/shop/christoph-molnar/interpretable-machine-learning/paperback/product-24449081.html?page=1&pageSize=4 Panel:  Connor Tann https://www.linkedin.com/in/connor-tann-a92906a1/ Dr. Tim Scarfe  Dr. Keith Duggar Video version: https://youtu.be/0LIACHcxpHU
undefined
Mar 6, 2021 • 1h 40min

#046 The Great ML Stagnation (Mark Saroufim and Dr. Mathew Salvaris)

Academics think of themselves as trailblazers, explorers — seekers of the truth. Any fundamental discovery involves a significant degree of risk. If an idea is guaranteed to work then it moves from the realm of research to engineering. Unfortunately, this also means that most research careers will invariably be failures at least if failures are measured via “objective” metrics like citations. Today we discuss the recent article from Mark Saroufim called Machine Learning: the great stagnation. We discuss the rise of gentleman scientists, fake rigor, incentives in ML, SOTA-chasing, "graduate student descent", distribution of talent in ML and how to learn effectively.   With special guest interviewer Mat Salvaris.  Machine learning: the great stagnation [00:00:00] Main show kick off [00:16:30] Great stagnation article / Bad incentive systems in academia [00:18:24] OpenAI is a media business [00:19:48] Incentive structures in academia [00:22:13] SOTA chasing [00:24:47] F You Money [00:28:53] Research grants and gentlemen scientists [00:29:13] Following your own gradient of interest and making a contribution [00:33:27] Marketing yourself to be successful [00:37:07] Tech companies create the bad incentives [00:42:20] GPT3 was sota chasing but it seemed really... "good"? Scaling laws? [00:51:09] Dota / game AI [00:58:39] Hard to go it alone? [01:02:08] Reaching out to people [01:09:21] Willingness to be wrong [01:13:14] Distribution of talent / tech interviews [01:18:30] What should you read online and how to learn? Sharing your stuff online and finding your niece [01:25:52] Mark Saroufim: https://marksaroufim.substack.com/ http://robotoverlordmanual.com/ https://twitter.com/marksaroufim https://www.youtube.com/marksaroufim Dr. Mathew Salvaris: https://www.linkedin.com/in/drmathewsalvaris/ https://twitter.com/MSalvaris
undefined
Feb 28, 2021 • 2h 30min

#045 Microsoft's Platform for Reinforcement Learning (Bonsai)

Microsoft has an interesting strategy with their new “autonomous systems” technology also known as Project Bonsai. They want to create an interface to abstract away the complexity and esoterica of deep reinforcement learning. They want to fuse together expert knowledge and artificial intelligence all on one platform, so that complex problems can be decomposed into simpler ones. They want to take machine learning Ph.Ds out of the equation and make autonomous systems engineering look more like a traditional software engineering process. It is an ambitious undertaking, but interesting. Reinforcement learning is extremely difficult (as I cover in the video), and if you don’t have a team of RL Ph.Ds with tech industry experience, you shouldn’t even consider doing it yourself. This is our take on it! There are 3 chapters in this video; Chapter 1: Tim's intro and take on RL being hard, intro to Bonsai and machine teaching  Chapter 2: Interview with Scott Stanfield [recorded Jan 2020] 00:56:41 Chapter 3: Traditional street talk episode [recorded Dec 2020] 01:38:13 This is *not* an official communication from Microsoft, all personal opinions. There is no MS-confidential information in this video.  With: Scott Stanfield https://twitter.com/seesharp Megan Bloemsma https://twitter.com/BloemsmaMegan Gurdeep Pall (he has not validated anything we have said in this video or been involved in the creation of it) https://www.linkedin.com/in/gurdeep-pall-0aa639bb/ Panel:  Dr. Keith Duggar Dr. Tim Scarfe Yannic Kilcher
undefined
Feb 25, 2021 • 52min

#044 - Data-efficient Image Transformers (Hugo Touvron)

Today we are going to talk about the *Data-efficient image Transformers paper or (DeiT) which Hugo is the primary author of. One of the recipes of success for vision models since the DL revolution began has been the availability of large training sets. CNNs have been optimized for almost a decade now, including through extensive architecture search which is prone to overfitting. Motivated by the success of transformers-based models in Natural Language Processing there has been increasing attention in applying these approaches to vision models. Hugo and his collaborators used a different training strategy and a new distillation token to get a massive increase in sample efficiency with image transformers.  00:00:00 Introduction 00:06:33 Data augmentation is all you need 00:09:53 Now the image patches are the convolutions though? 00:12:16 Where are those inductive biases hiding? 00:15:46 Distillation token 00:21:01 Why different resolutions on training 00:24:14 How data efficient can we get? 00:26:47 Out of domain generalisation 00:28:22 Why are transformers data efficient at all? Learning invariances 00:32:04 Is data augmentation cheating? 00:33:25 Distillation strategies - matching the intermediatae teacher representation as well as output 00:35:49 Do ML models learn the same thing for a problem? 00:39:01 How is it like at Facebook AI? 00:41:17 How long is the PhD programme? 00:42:03 Other interests outside of transformers? 00:43:18 Transformers for Vision and Language 00:47:40 Could we improve transformers models? (Hybrid models) 00:49:03 Biggest challenges in AI? 00:50:52 How far can we go with data driven approach?
undefined
Feb 19, 2021 • 1h 35min

#043 Prof J. Mark Bishop - Artificial Intelligence Is Stupid and Causal Reasoning won't fix it.

Professor Mark Bishop does not think that computers can be conscious or have phenomenological states of consciousness unless we are willing to accept panpsychism which is idea that mentality is fundamental and ubiquitous in the natural world, or put simply, that your goldfish and everything else for that matter has a mind. Panpsychism postulates that distinctions between intelligences are largely arbitrary. Mark’s work in the ‘philosophy of AI’ led to an influential critique of computational approaches to Artificial Intelligence through a thorough examination of John Searle's 'Chinese Room Argument' Mark just published a paper called artificial intelligence is stupid and causal reasoning wont fix it. He makes it clear in this paper that in his opinion computers will never be able to compute everything, understand anything, or feel anything.  00:00:00​ Tim Intro 00:15:04​ Intro  00:18:49​ Introduction to Marks ideas  00:25:49​ Some problems are not computable  00:29:57​ the dancing was Pixies fallacy  00:32:36​ The observer relative problem, and its all in the mapping  00:43:03​ Conscious Experience  00:53:30​ Intelligence without representation, consciousness is something that we do  01:02:36​ Consciousness helps us to act autonomously  01:05:13​ The Chinese room argument  01:14:58​ Simulation argument and computation doesn't have phenomenal consciousness  01:17:44​ Language informs our colour perception  01:23:11​ We have our own distinct ontologies  01:27:12​ Kurt Gödel, Turing and Penrose and the implications of their work 
undefined
Feb 11, 2021 • 1h 34min

#042 - Pedro Domingos - Ethics and Cancel Culture

Today we have professor Pedro Domingos and we are going to talk about activism in machine learning, cancel culture, AI ethics and kernels. In Pedro's book the master algorithm, he segmented the AI community into 5 distinct tribes with 5 unique identities (and before you ask, no the irony of an anti-identitarian doing do was not lost on us!). Pedro recently published an article in Quillette called Beating Back Cancel Culture: A Case Study from the Field of Artificial Intelligence. Domingos has railed against political activism in the machine learning community and cancel culture. Recently Pedro was involved in a controversy where he asserted the NeurIPS broader impact statements are an ideological filter mechanism. Important Disclaimer: All views expressed are personal opinions. 00:00:00 Caveating 00:04:08 Main intro 00:07:44 Cancelling culture is a culture and intellectual weakness  00:12:26 Is cancel culture a post-modern religion?  00:24:46 Should we have gateways and gatekeepers?  00:29:30 Does everything require broader impact statements?  00:33:55 We are stifling diversity (of thought) not promoting it.  00:39:09 What is fair and how to do fair? 00:45:11 Models can introduce biases by compressing away minority data  00:48:36 Accurate but unequal soap dispensers  00:53:55 Agendas are not even self-consistent  00:56:42 Is vs Ought: all variables should be used for Is  01:00:38 Fighting back cancellation with cancellation? 01:10:01 Intent and degree matter in right vs wrong.  01:11:08 Limiting principles matter  01:15:10 Gradient descent and kernels  01:20:16 Training Journey matter more than Destination  01:24:36 Can training paths teach us about symmetry? 01:28:37 What is the most promising path to AGI?  01:31:29 Intelligence will lose its mystery
undefined
Feb 3, 2021 • 1h 27min

#041 - Biologically Plausible Neural Networks - Dr. Simon Stringer

Dr. Simon Stringer. Obtained his Ph.D in mathematical state space control theory and has been a Senior Research Fellow at Oxford University for over 27 years. Simon is the director of the the Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, which is based within the Oxford University Department of Experimental Psychology. His department covers vision, spatial processing, motor function, language and consciousness -- in particular -- how the primate visual system learns to make sense of complex natural scenes. Dr. Stringers laboratory houses a team of theoreticians, who are developing computer models of a range of different aspects of brain function. Simon's lab is investigating the neural and synaptic dynamics that underpin brain function. An important matter here is the The feature-binding problem which concerns how the visual system represents the hierarchical relationships between features. the visual system must represent hierarchical binding relations across the entire visual field at every spatial scale and level in the hierarchy of visual primitives. We discuss the emergence of self-organised behaviour, complex information processing, invariant sensory representations and hierarchical feature binding which emerges when you build biologically plausible neural networks with temporal spiking dynamics.  00:00:09 Tim Intro  00:09:31 Show kickoff  00:14:37 Hierarchical Feature binding and timing of action potentials  00:30:16 Hebb to Spike-timing-dependent plasticity (STDP)  00:35:27 Encoding of shape primitives  00:38:50 Is imagination working in the same place in the brain  00:41:12 Compare to supervised CNNs  00:45:59 Speech recognition, motor system, learning mazes  00:49:28 How practical are these spiking NNs  00:50:19 Why simulate the human brain  00:52:46 How much computational power do you gain from differential timings  00:55:08 Adversarial inputs  00:59:41 Generative / causal component needed?  01:01:46 Modalities of processing i.e. language  01:03:42 Understanding  01:04:37 Human hardware  01:06:19 Roadmap of NNs?  01:10:36 Intepretability methods for these new models  01:13:03 Won't GPT just scale and do this anyway?  01:15:51 What about trace learning and transformation learning  01:18:50 Categories of invariance  01:19:47 Biological plausibility  https://www.youtube.com/watch?v=aisgNLypUKs
undefined
Jan 31, 2021 • 1h 36min

#040 - Adversarial Examples (Dr. Nicholas Carlini, Dr. Wieland Brendel, Florian Tramèr)

Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. there's good reason to believe neural networks look at very different features than we would have expected.  As articulated in the 2019 "features not bugs" paper Adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans.  Adversarial examples don't just affect deep learning models. A cottage industry has sprung up around Threat Modeling in AI and ML Systems and their dependencies. Joining us this evening are some of currently leading researchers in adversarial examples; Florian Tramèr - A fifth year PhD student in Computer Science at Stanford University https://floriantramer.com/​ https://twitter.com/florian_tramer​ Dr. Wieland Brendel - Machine Learning Researcher at the University of Tübingen & Co-Founder of layer7.ai https://medium.com/@wielandbr​ https://twitter.com/wielandbr​ Dr. Nicholas Carlini - Research scientist at Google Brain working in that exciting space between machine learning and computer security.  https://nicholas.carlini.com/​ We really hope you enjoy the conversation, remember to subscribe!  Yannic Intro [00:00:00​] Tim Intro [00:04:07​] Threat Taxonomy [00:09:00​]  Main show intro [00:11:30​] Whats wrong with Neural Networks? [00:14:52​] The role of memorization [00:19:51​] Anthropomorphization of models [00:22:42​] Whats the harm really though / focusing on actual ML security risks [00:27:03​] Shortcut learning / OOD generalization [00:36:18​] Human generalization [00:40:11​] An existential problem in DL getting the models to learn what we want? [00:41:39​] Defenses to adversarial examples [00:47:15​] What if we had all the data and the labels? Still problems? [00:54:28​] Defenses are easily broken [01:00:24​] Self deception in academia [01:06:46​] ML Security [01:28:15​] https://www.youtube.com/watch?v=2PenK06tvE4
undefined
Jan 23, 2021 • 1h 58min

#039 - Lena Voita - NLP

ena Voita is a Ph.D. student at the University of Edinburgh and University of Amsterdam. Previously, She was a research scientist at Yandex Research and worked closely with the Yandex Translate team. She still teaches NLP at the Yandex School of Data Analysis. She has created an exciting new NLP course on her website lena-voita.github.io which you folks need to check out! She has one of the most well presented blogs we have ever seen, where she discusses her research in an easily digestable manner. Lena has been investigating many fascinating topics in machine learning and NLP. Today we are going to talk about three of her papers and corresponding blog articles; Source and Target Contributions to NMT Predictions -- Where she talks about the influential dichotomy between the source and the prefix of neural translation models. https://arxiv.org/pdf/2010.10907.pdf https://lena-voita.github.io/posts/source_target_contributions_to_nmt.html Information-Theoretic Probing with MDL -- Where Lena proposes a technique of evaluating a model using the minimum description length or Kolmogorov complexity of labels given representations rather than something basic like accuracy https://arxiv.org/pdf/2003.12298.pdf https://lena-voita.github.io/posts/mdl_probes.html Evolution of Representations in the Transformer - Lena investigates the evolution of representations of individual tokens in Transformers -- trained with different training objectives (MT, LM, MLM)  https://arxiv.org/abs/1909.01380 https://lena-voita.github.io/posts/emnlp19_evolution.html Panel Dr. Tim Scarfe, Yannic Kilcher, Sayak Paul 00:00:00 Kenneth Stanley / Greatness can not be planned house keeping 00:21:09 Kilcher intro 00:28:54 Hello Lena 00:29:21 Tim - Lenas NMT paper 00:35:26 Tim - Minimum Description Length / Probe paper 00:40:12 Tim - Evolution of representations 00:46:40 Lenas NLP course 00:49:18 The peppermint tea situation  00:49:28 Main Show Kick Off  00:50:22 Hallucination vs exposure bias  00:53:04 Lenas focus on explaining the models not SOTA chasing 00:56:34 Probes paper and NLP intepretability 01:02:18 Why standard probing doesnt work 01:12:12 Evolutions of representations paper 01:23:53 BERTScore  and BERT Rediscovers the Classical NLP Pipeline paper 01:25:10 Is the shifting encoding context because of BERT bidirectionality 01:26:43 Objective defines which information we lose on input 01:27:59 How influential is the dataset? 01:29:42 Where is the community going wrong? 01:31:55 Thoughts on GOFAI/Understanding in NLP? 01:36:38 Lena's NLP course  01:47:40 How to foster better learning / understanding 01:52:17 Lena's toolset and languages 01:54:12 Mathematics is all you need 01:56:03 Programming languages https://lena-voita.github.io/ https://www.linkedin.com/in/elena-voita/ https://scholar.google.com/citations?user=EcN9o7kAAAAJ&hl=ja https://twitter.com/lena_voita
undefined
Jan 20, 2021 • 2h 46min

#038 - Professor Kenneth Stanley - Why Greatness Cannot Be Planned

Professor Kenneth Stanley is currently a research science manager at OpenAI in San Fransisco. We've Been dreaming about getting Kenneth on the show since the very begininning of Machine Learning Street Talk. Some of you might recall that our first ever show was on the enhanced POET paper, of course Kenneth had his hands all over it. He's been cited over 16000 times, his most popular paper with over 3K citations was the NEAT algorithm. His interests are neuroevolution, open-endedness, NNs, artificial life, and AI. He invented the concept of novelty search with no clearly defined objective. His key idea is that there is a tyranny of objectives prevailing in every aspect of our lives, society and indeed our algorithms. Crucially, these objectives produce convergent behaviour and thinking and distract us from discovering stepping stones which will lead to greatness. He thinks that this monotonic objective obsession, this idea that we need to continue to improve benchmarks every year is dangerous. He wrote about this in detail in his recent book "greatness can not be planned" which will be the main topic of discussion in the show. We also cover his ideas on open endedness in machine learning.  00:00:00 Intro to Kenneth  00:01:16 Show structure disclaimer  00:04:16 Passionate discussion  00:06:26 WHy greatness cant be planned and the tyranny of objectives  00:14:40 Chinese Finger Trap   00:16:28 Perverse Incentives and feedback loops  00:18:17 Deception  00:23:29 Maze example  00:24:44 How can we define curiosity or interestingness  00:26:59 Open endedness  00:33:01 ICML 2019 and Yannic, POET, first MSLST  00:36:17 evolutionary algorithms++  00:43:18 POET, the first MLST   00:45:39 A lesson to GOFAI people  00:48:46 Machine Learning -- the great stagnation  00:54:34 Actual scientific successes are usually luck, and against the odds -- Biontech  00:56:21 Picbreeder and NEAT  01:10:47 How Tim applies these ideas to his life and why he runs MLST  01:14:58 Keith Skit about UCF  01:15:13 Main show kick off  01:18:02 Why does Kenneth value serindipitous exploration so much  01:24:10 Scientific support for Keneths ideas in normal life  01:27:12 We should drop objectives to achieve them. An oxymoron?  01:33:13 Isnt this just resource allocation between exploration and exploitation?  01:39:06 Are objectives merely a matter of degree?  01:42:38 How do we allocate funds for treasure hunting in society  01:47:34 A keen nose for what is interesting, and voting can be dangerous  01:53:00 Committees are the antithesis of innovation  01:56:21 Does Kenneth apply these ideas to his real life?  01:59:48 Divergence vs interestingness vs novelty vs complexity  02:08:13 Picbreeder  02:12:39 Isnt everything novel in some sense?  02:16:35 Imagine if there was no selection pressure?  02:18:31 Is innovation == environment exploitation?  02:20:37 Is it possible to take shortcuts if you already knew what the innovations were?  02:21:11 Go Explore -- does the algorithm encode the stepping stones?  02:24:41 What does it mean for things to be interestingly different?  02:26:11 behavioral characterization / diversity measure to your broad interests  02:30:54 Shaping objectives  02:32:49 Why do all ambitious objectives have deception? Picbreeder analogy  02:35:59 Exploration vs Exploitation, Science vs Engineering  02:43:18 Schools of thought in ML and could search lead to AGI  02:45:49 Official ending 

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode