Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Latest episodes

undefined
Sep 16, 2020 • 1h 26min

Explainability, Reasoning, Priors and GPT-3

This week Dr. Tim Scarfe and Dr. Keith Duggar discuss Explainability, Reasoning, Priors and GPT-3. We check out Christoph Molnar's book on intepretability, talk about priors vs experience in NNs, whether NNs are reasoning and also cover articles by Gary Marcus and Walid Saba critiquing deep learning. We finish with a brief discussion of Chollet's ARC challenge and intelligence paper.  00:00:00 Intro 00:01:17 Explainability and Christoph Molnars book on Intepretability 00:26:45 Explainability - Feature visualisation 00:33:28 Architecture / CPPNs 00:36:10 Invariance and data parsimony, priors and experience, manifolds 00:42:04 What NNs learn / logical view of modern AI (Walid Saba article) 00:47:10 Core knowledge 00:55:33 Priors vs experience  00:59:44 Mathematical reasoning  01:01:56 Gary Marcus on GPT-3  01:09:14 Can NNs reason at all?  01:18:05 Chollet intelligence paper/ARC challenge
undefined
Sep 14, 2020 • 1h 28min

SWaV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (Mathilde Caron)

This week Dr. Tim Scarfe, Yannic Lightspeed Kicher, Sayak Paul and Ayush Takur interview Mathilde Caron from Facebook Research (FAIR). We discuss Mathilde's paper which she wrote with her collaborators "SWaV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments" @ https://arxiv.org/pdf/2006.09882.pdf  This paper is the latest unsupervised contrastive visual representations algorithm and has a new data augmentation strategy and also a new online clustering strategy.  Note; Other authors; Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin Sayak Paul -  @RisingSayak / https://www.linkedin.com/in/sayak-paul/ Ayush Thakur - @ayushthakur0  / https://www.linkedin.com/in/ayush-thakur-731914149/ The article they wrote; https://app.wandb.ai/authors/swav-tf/reports/Unsupervised-Visual-Representation-Learning-with-SwAV--VmlldzoyMjg3Mzg 00:00:00 Yannic probability challenge (CAN YOU SOLVE IT?) 00:01:29 Intro topic (Tim) 00:08:18 Yannic take 00:09:33 Intro show and guests 00:11:29 SWaV elevator pitch  00:17:31 Clustering approach in general 00:21:17 Sayak and Ayush's article on SWaV  00:23:49 Optional transport problem / Sinkhorn-Knopp algorithm 00:31:43 Is clustering a natural approach for this? 00:44:19 Image augmentations  00:46:20 Priors vs experience (data) 00:48:32 Life at FAIR  00:52:33 Progress of image augmentation  00:56:10 When things do not go to plan with research 01:01:04 Question on architecture 01:01:43 SWaV Results 01:06:26 Reproducing Matilde's code 01:14:51 Do we need the whole dataset to set clustering loss 01:16:40 Self-supervised learning and transfer learning 01:23:25 Link to attention mechanism) 01:24:41 Sayak final thought why unsupervised better 01:25:56 Outro Abstract;  "Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks."
undefined
Sep 7, 2020 • 1h 35min

UK Algoshambles, Neuralink, GPT-3 and Intelligence

This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic "Lightspeed" Kilcher respond to the "Algoshambles" exam fiasco in the UK where the government were forced to step in to standardise the grades which were grossly inflated by the schools.  The schools and teachers are all paid on metrics related to the grades received by students, what could possibly go wrong?! The result is that we end up with grades which have lost all their value and students are coached for the exams and don't actually learn the subject.   We also cover the second Francois Chollet interview on the Lex Fridman podcast. We cover GPT-3, Neuralink, and discussion of intelligence. 00:00:00 Algoshambles  00:45:40 Lex Fridman/Chollet: Intro  00:55:21 Lex Fridman/Chollet: Neuralink  01:06:28 Lex Fridman/Chollet: GPT-3  01:23:43 Lex Fridman/Chollet: Intelligence discussion
undefined
Jul 17, 2020 • 1h 36min

Sayak Paul

This week we spoke with Sayak Paul, who is extremely active in the machine learning community. We discussed the AI landscape in India, unsupervised representation learning, data augmentation and contrastive learning, explainability, abstract scene representations and finally pruning and the recent super positions paper. I really enjoyed this conversation and I hope you folks do too! 00:00:00 Intro to Sayak 00:17:50 AI landscape in India 00:24:20 Unsupervised representation learning 00:26:11 DATA AUGMENTATION/Contrastive learning 00:59:20 EXPLAINABILITY 01:12:10 ABSTRACT SCENE REPRESENTATIONS 01:14:50 PRUNING and super position paper
undefined
Jul 8, 2020 • 1h 46min

Robert Lange on NN Pruning and Collective Intelligence

We speak with Robert Lange! Robert is a PhD student at the Technical University Berlin. His research combines Deep Multi-Agent Reinforcement Learning and Cognitive Science to study the learning dynamics of large collectives. He has a brilliant blog where he distils and explains cutting edge ML research. We spoke about his story, economics, multi-agent RL, intelligence and AGI, and his recent article summarising the state of the art in neural network pruning.  Robert's article on pruning in NNs https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/ 00:00:00 Intro 00:04:17 Show start and intro to Robert 00:11:39 Economics background  00:27:20 Intrinsic motivation  00:33:22 Intelligence/consciousness 00:48:16 Lottery ticket/pruning article discussion 01:43:21 Robert's advice for younger self and state of deep learning Robert's LinkedIn: https://www.linkedin.com/in/robert-tjarko-lange-19539a12a/ @RobertTLange #machinelearning #deeplearning
undefined
Jun 30, 2020 • 1h 58min

WelcomeAIOverlords (Zak Jost)

We welcome Zak Jost from the WelcomeAIOverlords channel. Zak is an ML research scientist at Amazon. He has a great blog at http://blog.zakjost.com and also a Discord channel at https://discord.gg/xh2chKX WelcomeAIOverlords: https://www.youtube.com/channel/UCxw9_WYmLqlj5PyXu2AWU_g  00:00:00 INTRO START 00:01:07 MAIN SHOW START 00:01:59 ZAK'S STORY 00:05:06 YOUTUBE DISCUSSION 00:24:12 UNDERSTANDING PAPERS 00:29:53 CONTRASTIVE LEARNING INTRO 00:33:00 BRING YOUR OWN LATENT PAPER 01:03:13 GRAPHS IN ML AND KNOWLEDGE GRAPHS  01:21:36 GRAPH USE CASES - FRAUD 01:30:15 KNOWLEDGE GRAPHS 01:34:22 GRAPHS IN ML 01:38:53 AUTOMATED ML 01:57:32 OUTRO
undefined
Jun 24, 2020 • 1h 3min

Facebook Research - Unsupervised Translation of Programming Languages

In this episode of Machine Learning Street Talk Dr. Tim Scarfe, Yannic Kilcher and Connor Shorten spoke with Marie-Anne Lachaux, Baptiste Roziere and Dr. Guillaume Lample from Facebook Research (FAIR) in Paris. They recently released the paper "Unsupervised Translation of Programming Languages" which was an exciting new approach to learned translation of programming languages (learned transcoder) using an unsupervised encoder trained on individual monolingual corpora i.e. no parallel language data needed. The trick they used what that there is significant token overlap when using word-piece embeddings. It was incredible to talk with this talented group of researchers and I hope you enjoy the conversation too.  Yannic's video on this got watched over 120K times! Check it out too https://www.youtube.com/watch?v=xTzFJIknh7E Paper https://arxiv.org/abs/2006.03511;  Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample Abstract; "A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin."
undefined
Jun 19, 2020 • 2h 34min

Francois Chollet - On the Measure of Intelligence

We cover Francois Chollet's recent paper. Abstract; To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
undefined
Jun 6, 2020 • 1h 52min

OpenAI GPT-3: Language Models are Few-Shot Learners

In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model. With the help of Microsoft’s ZeRO-2 / DeepSpeed optimiser, OpenAI trained an 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning. 00:00:00 Intro 00:00:54 ZeRO1+2 (model + Data parallelism) (Connor) 00:03:17 Recent history of NLP (Tim) 00:06:04 Yannic "Light-speed" Kilcher's brief overview of GPT-3 00:14:25 Reviewing Yannic's YT comments on his GPT-3 video (Tim) 00:20:26 Main show intro 00:23:03 Is GPT-3 reasoning?  00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT) 00:36:18 Utility of GPT-3 in industry 00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2) 00:51:03 Generalisation 00:56:48 Esoterics of language models 00:58:46 Architectural trade-offs 01:07:37 Memorization machines and intepretability 01:17:16 Nearest neighbour probes / watermarks 01:20:03 YouTube comments on GPT-3 video  01:21:50 GPT-3 news article generation issue 01:27:36 Sampling data for language models / bias / fairness / politics 01:51:12 Outro These paradigms of task adaptation are divided into zero, one, and few shot learning. Zero-shot learning is a very extreme case where we expect a language model to perform a task such as sentiment classification or extractive question answering, without any additional supervision. One and Few-shot learning provide some examples to the model. However, GPT-3s definition of this diverges a bit from the conventional literature. GPT-3 provides one and few-shot examples in the form of “In-Context Learning”. Instead of fine-tuning the model on a few examples, the model has to use the input to infer the downstream task. For example, the GPT-3 transformer has an input sequence of 2048 tokens, so demonstrations of a task such as yelp sentiment reviews, would have to fit in this input sequence as well as the new review. Thanks for watching! Please Subscribe! Paper Links: GPT-3: https://arxiv.org/abs/2005.14165 ZeRO: https://arxiv.org/abs/1910.02054 ZeRO (Blog Post): https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/ ZeRO-2 (Blog Post): https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/?OCID=msr_blog_deepspeed2_build_tw #machinelearning #naturallanguageprocessing #deeplearning #gpt3
undefined
Jun 3, 2020 • 1h 13min

Jordan Edwards: ML Engineering and DevOps on AzureML

This week we had a super insightful conversation with  Jordan Edwards, Principal Program Manager for the AzureML team!  Jordan is on the coalface of turning machine learning software engineering into a reality for some of Microsoft's largest customers.  ML DevOps is all about increasing the velocity of- and orchastrating the non-interactive phase of- software deployments for ML. We cover ML DevOps and Microsoft Azure ML. We discuss model governance, testing, intepretability, tooling. We cover the age-old discussion of the dichotomy between science and engineering and how you can bridge the gap with ML DevOps. We cover Jordan's maturity model for ML DevOps.  We also cover off some of the exciting ML announcments from the recent Microsoft Build conference i.e. FairLearn, IntepretML, SEAL, WhiteNoise, OpenAI code generation, OpenAI GPT-3.  00:00:04 Introduction to ML DevOps and Microsoft Build ML Announcements 00:10:29 Main show kick-off 00:11:06 Jordan's story 00:14:36 Typical ML DevOps workflow 00:17:38 Tim's articulation of ML DevOps 00:19:31 Intepretability / Fairness 00:24:31 Testing / Robustness 00:28:10 Using GANs to generate testing data 00:30:26 Gratuitous DL? 00:33:46 Challenges of making an ML DevOps framework / IaaS 00:38:48 Cultural battles in ML DevOps 00:43:04 Maturity Model for Ml DevOps 00:49:19 ML: High interest credit card of technical debt paper 00:50:19 ML Engineering at Microsoft 01:01:20 ML Flow 01:03:05 Company-wide governance  01:08:15 What's coming next 01:12:10 Jordan's hillarious piece of advice for his younger self Super happy with how this turned out, this is not one to miss folks!  #deeplearning #machinelearning #devops #mldevops

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode