Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Latest episodes

undefined
Jun 3, 2020 • 1h 13min

Jordan Edwards: ML Engineering and DevOps on AzureML

This week we had a super insightful conversation with  Jordan Edwards, Principal Program Manager for the AzureML team!  Jordan is on the coalface of turning machine learning software engineering into a reality for some of Microsoft's largest customers.  ML DevOps is all about increasing the velocity of- and orchastrating the non-interactive phase of- software deployments for ML. We cover ML DevOps and Microsoft Azure ML. We discuss model governance, testing, intepretability, tooling. We cover the age-old discussion of the dichotomy between science and engineering and how you can bridge the gap with ML DevOps. We cover Jordan's maturity model for ML DevOps.  We also cover off some of the exciting ML announcments from the recent Microsoft Build conference i.e. FairLearn, IntepretML, SEAL, WhiteNoise, OpenAI code generation, OpenAI GPT-3.  00:00:04 Introduction to ML DevOps and Microsoft Build ML Announcements 00:10:29 Main show kick-off 00:11:06 Jordan's story 00:14:36 Typical ML DevOps workflow 00:17:38 Tim's articulation of ML DevOps 00:19:31 Intepretability / Fairness 00:24:31 Testing / Robustness 00:28:10 Using GANs to generate testing data 00:30:26 Gratuitous DL? 00:33:46 Challenges of making an ML DevOps framework / IaaS 00:38:48 Cultural battles in ML DevOps 00:43:04 Maturity Model for Ml DevOps 00:49:19 ML: High interest credit card of technical debt paper 00:50:19 ML Engineering at Microsoft 01:01:20 ML Flow 01:03:05 Company-wide governance  01:08:15 What's coming next 01:12:10 Jordan's hillarious piece of advice for his younger self Super happy with how this turned out, this is not one to miss folks!  #deeplearning #machinelearning #devops #mldevops
undefined
Jun 2, 2020 • 2h 29min

One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)

*Note this is an episode from Tim's Machine Learning Dojo YouTube channel.  Join Eric Craeymeersch on a wonderful discussion all about ML engineering, computer vision, siamese networks, contrastive loss, one shot learning and metric learning.  00:00:00 Introduction  00:11:47 ML Engineering Discussion 00:35:59 Intro to the main topic 00:42:13 Siamese Networks 00:48:36 Mining strategies 00:51:15 Contrastive Loss 00:57:44 Trip loss paper 01:09:35 Quad loss paper 01:25:49 Eric's Quadloss Medium Article  02:17:32 Metric learning reality check 02:21:06 Engineering discussion II 02:26:22 Outro In our second paper review call, Tess Ferrandez covered off the FaceNet paper from Google which was a one-shot siamese network with the so called triplet loss. It was an interesting change of direction for NN architecture i.e. using a contrastive loss instead of having a fixed number of output classes. Contrastive architectures have been taking over the ML landscape recently i.e. SimCLR, MOCO, BERT.  Eric wrote an article about this at the time: https://medium.com/@crimy/one-shot-learning-siamese-networks-and-triplet-loss-with-keras-2885ed022352  He then discovered there was a new approach to one shot learning in vision using a quadruplet loss and metric learning. Eric wrote a new article and several experiments on this @ https://medium.com/@crimy/beyond-triplet-loss-one-shot-learning-experiments-with-quadruplet-loss-16671ed51290?source=friends_link&sk=bf41673664ad8a52e322380f2a456e8b Paper details:  Beyond triplet loss: a deep quadruplet network for person re-identification https://arxiv.org/abs/1704.01719 (Chen at al '17) "Person re-identification (ReID) is an important task in wide area video surveillance which focuses on identifying people across different cameras. Recently, deep learning networks with a triplet loss become a common framework for person ReID. However, the triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. In this paper, we design a quadruplet loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalization ability and can achieve a higher performance on the testing set. In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID. In extensive experiments, the proposed network outperforms most of the state-of-the-art algorithms on representative datasets which clearly demonstrates the effectiveness of our proposed method." Original facenet paper;  https://arxiv.org/abs/1503.03832 #deeplearning #machinelearning
undefined
May 25, 2020 • 1h 38min

Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning

In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten interviewed Harri Valpola, CEO and Founder of Curious AI. We continued our discussion of System 1 and System 2 thinking in Deep Learning, as well as miscellaneous topics around Model-based Reinforcement Learning. Dr. Valpola describes some of the challenges of modelling industrial control processes such as water sewage filters and paper mills with the use of model-based RL. Dr. Valpola and his collaborators recently published “Regularizing Trajectory Optimization with Denoising Autoencoders” that addresses some of the concerns of planning algorithms that exploit inaccuracies in their world models! 00:00:00 Intro to Harri and Curious AI System1/System 2 00:04:50 Background on model-based RL challenges from Tim 00:06:26 Other interesting research papers on model-based RL from Connor 00:08:36 Intro to Curious AI recent NeurIPS paper on model-based RL and denoising autoencoders from Yannic 00:21:00 Main show kick off, system 1/2 00:31:50 Where does the simulator come from? 00:33:59 Evolutionary priors 00:37:17 Consciousness 00:40:37 How does one build a company like Curious AI? 00:46:42 Deep Q Networks 00:49:04 Planning and Model based RL 00:53:04 Learning good representations 00:55:55 Typical problem Curious AI might solve in industry 01:00:56 Exploration 01:08:00 Their paper - regularizing trajectory optimization with denoising 01:13:47 What is Epistemic uncertainty 01:16:44 How would Curious develop these models 01:18:00 Explainability and simulations 01:22:33 How system 2 works in humans 01:26:11 Planning 01:27:04 Advice for starting an AI company 01:31:31 Real world implementation of planning models 01:33:49 Publishing research and openness We really hope you enjoy this episode, please subscribe! Regularizing Trajectory Optimization with Denoising Autoencoders: https://papers.nips.cc/paper/8552-regularizing-trajectory-optimization-with-denoising-autoencoders.pdf Pulp, Paper & Packaging: A Future Transformed through Deep Learning: https://thecuriousaicompany.com/pulp-paper-packaging-a-future-transformed-through-deep-learning/ Curious AI: https://thecuriousaicompany.com/ Harri Valpola Publications: https://scholar.google.com/citations?user=1uT7-84AAAAJ&hl=en&oi=ao Some interesting papers around Model-Based RL: GameGAN: https://cdn.arstechnica.net/wp-content/uploads/2020/05/Nvidia_GameGAN_Research.pdf Plan2Explore: https://ramanans1.github.io/plan2explore/ World Models: https://worldmodels.github.io/ MuZero: https://arxiv.org/pdf/1911.08265.pdf PlaNet: A Deep Planning Network for RL: https://ai.googleblog.com/2019/02/introducing-planet-deep-planning.html Dreamer: Scalable RL using World Models: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html Model Based RL for Atari: https://arxiv.org/pdf/1903.00374.pdf
undefined
May 22, 2020 • 2h 34min

ICLR 2020: Yoshua Bengio and the Nature of Consciousness

In this episode of Machine Learning Street Talk, Tim Scarfe, Connor Shorten and Yannic Kilcher react to Yoshua Bengio’s ICLR 2020 Keynote “Deep Learning Priors Associated with Conscious Processing”. Bengio takes on many future directions for research in Deep Learning such as the role of attention in consciousness, sparse factor graphs and causality, and the study of systematic generalization. Bengio also presents big ideas in Intelligence that border on the line of philosophy and practical machine learning. This includes ideas such as consciousness in machines and System 1 and System 2 thinking, as described in Daniel Kahneman’s book “Thinking Fast and Slow”. Similar to Yann LeCun’s half of the 2020 ICLR keynote, this talk takes on many challenging ideas and hopefully this video helps you get a better understanding of some of them! Thanks for watching!  Please Subscribe for more videos! Paper Links: Link to Talk: https://iclr.cc/virtual_2020/speaker_7.html The Consciousness Prior: https://arxiv.org/abs/1709.08568 Thinking Fast and Slow: https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555 Systematic Generalization: https://arxiv.org/abs/1811.12889 CLOSURE: Assessing Systematic Generalization of CLEVR Models: https://arxiv.org/abs/1912.05783 Neural Module Networks: https://arxiv.org/abs/1511.02799 Experience Grounds Language: https://arxiv.org/pdf/2004.10151.pdf Benchmarking Graph Neural Networks: https://arxiv.org/pdf/2003.00982.pdf On the Measure of Intelligence: https://arxiv.org/abs/1911.01547 Please check out our individual channels as well! Machine Learning Dojo with Tim Scarfe: https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA Yannic Kilcher: https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfe Henry AI Labs: https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw 00:00:00 Tim and Yannics takes 00:01:37 Intro to Bengio 00:03:13 System 2, language and Chomsky 00:05:58 Cristof Koch on conciousness 00:07:25 Francois Chollet on intelligence and consciousness 00:09:29 Meditation and Sam Harris on consciousness 00:11:35 Connor Intro 00:13:20 Show Main Intro 00:17:55 Priors associated with Conscious Processing 00:26:25 System 1 / System 2 00:42:47 Implicit and Verbalized Knowledge [DONT MISS THIS!] 01:08:24 Inductive Priors for DL 2.0 01:27:20 Systematic Generalization 01:37:53 Contrast with the Symbolic AI Program 01:54:55 Attention 02:00:25 From Attention to Consciousness 02:05:31 Thoughts, Consciousness, Language 02:06:55 Sparse Factor graph 02:10:52 Sparse Change in Abstract Latent Space 02:15:10 Discovering Cause and Effect 02:20:00 Factorize the joint distribution 02:22:30 RIMS: Modular Computation 02:24:30 Conclusion #machinelearning #deeplearning
undefined
May 19, 2020 • 2h 12min

ICLR 2020: Yann LeCun and Energy-Based Models

This week Connor Shorten, Yannic Kilcher and Tim Scarfe reacted to Yann LeCun's keynote speech at this year's ICLR conference which just passed. ICLR is the number two ML conference and was completely open this year, with all the sessions publicly accessible via the internet. Yann spent most of his talk speaking about self-supervised learning, Energy-based models (EBMs) and manifold learning. Don't worry if you hadn't heard of EBMs before, neither had we! Thanks for watching! Please Subscribe! Paper Links: ICLR 2020 Keynote Talk: https://iclr.cc/virtual_2020/speaker_7.html A Tutorial on Energy-Based Learning: http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf Concept Learning with Energy-Based Models (Yannic's Explanation): https://www.youtube.com/watch?v=Cs_j-oNwGgg Concept Learning with Energy-Based Models (Paper): https://arxiv.org/pdf/1811.02486.pdf Concept Learning with Energy-Based Models (OpenAI Blog Post): https://openai.com/blog/learning-concepts-with-energy-functions/ #deeplearning #machinelearning #iclr #iclr2020 #yannlecun
undefined
May 19, 2020 • 1h 27min

The Lottery Ticket Hypothesis with Jonathan Frankle

In this episode of Machine Learning Street Talk, we chat with Jonathan Frankle, author of The Lottery Ticket Hypothesis. Frankle has continued researching Sparse Neural Networks, Pruning, and Lottery Tickets leading to some really exciting follow-on papers! This chat discusses some of these papers such as Linear Mode Connectivity, Comparing and Rewinding and Fine-tuning in Neural Network Pruning, and more (full list of papers linked below). We also chat about how Jonathan got into Deep Learning research, his Information Diet, and work on developing Technology Policy for Artificial Intelligence!  This was a really fun chat, I hope you enjoy listening to it and learn something from it! Thanks for watching and please subscribe! Huge thanks to everyone on r/MachineLearning who asked questions! Paper Links discussed in the chat: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks: https://arxiv.org/abs/1803.03635 Linear Mode Connectivity and the Lottery Ticket Hypothesis: https://arxiv.org/abs/1912.05671 Dissecting Pruned Neural Networks: https://arxiv.org/abs/1907.00262 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs: https://arxiv.org/abs/2003.00152 What is the State of Neural Network Pruning? https://arxiv.org/abs/2003.03033 The Early Phase of Neural Network Training: https://arxiv.org/abs/2002.10365 Comparing Rewinding and Fine-tuning in Neural Network Pruning: https://arxiv.org/abs/2003.02389 (Also Mentioned) Block-Sparse GPU Kernels: https://openai.com/blog/block-sparse-gpu-kernels/ Balanced Sparsity for Efficient DNN Inference on GPU: https://arxiv.org/pdf/1811.00206.pdf Playing the Lottery with Rewards and Multiple Languages: Lottery Tickets in RL and NLP: https://arxiv.org/pdf/1906.02768.pdf r/MachineLearning question list: https://www.reddit.com/r/MachineLearning/comments/g9jqe0/d_lottery_ticket_hypothesis_ask_the_author_a/ (edited)  #machinelearning #deeplearning
undefined
May 19, 2020 • 1h 40min

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation. Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe! Paper Links discussed in the chat: Text-to-Text Transfer Transformer: https://arxiv.org/abs/1910.10683 Experience Grounds Language (relevant to divergent discussion about embodied cognition): https://arxiv.org/pdf/2004.10151.pdf On the Measure of Intelligence: https://arxiv.org/abs/1911.01547 Train Large, Then Compress: https://arxiv.org/pdf/2002.11794.pdf Scaling Laws for Neural Language Models: https://arxiv.org/pdf/2001.08361.pdf The Illustrated Transformer: http://jalammar.github.io/illustrated... ELECTRA: https://arxiv.org/pdf/2003.10555.pdf Transformer-XL: https://arxiv.org/pdf/1901.02860.pdf Reformer: The Efficient Transformer: https://openreview.net/pdf?id=rkgNKkHtvB The Evolved Transformer: https://arxiv.org/pdf/1901.11117.pdf DistilBERT: https://arxiv.org/pdf/1910.01108.pdf How to generate text (HIGHLY RECOMMEND): https://huggingface.co/blog/how-to-ge... Tokenizers: https://blog.floydhub.com/tokenization-nlp/
undefined
May 2, 2020 • 1h 15min

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

According to Yann Le Cun, the next big thing in machine learning is unsupervised learning. Self-supervision has changed the entire game in the last few years in deep learning, first transforming the language world with word2vec and BERT -- but now it's turning computer vision upside down.  This week Yannic, Connor and I spoke with one of the authors, Aravind Srinivas who recently co-led the hot-off-the-press CURL: Contrastive Unsupervised Representations for Reinforcement Learning alongside Michael (Misha) Laskin. CURL has had an incredible reception in the ML community in the last month or so. Remember the Deep Mind paper which solved the Atari games using the raw pixels? Aravind's approach uses contrastive unsupervised learning to featurise the pixels before applying RL. CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features! This is a huge step forwards in being able to apply RL in the real world.  We explore RL and self-supervision for computer vision in detail and find out about how Aravind got into machine learning.  Original YouTube Video: https://youtu.be/1MprzvYNpY8 Paper: CURL: Contrastive Unsupervised Representations for Reinforcement Learning Aravind Srinivas, Michael Laskin, Pieter Abbeel https://arxiv.org/pdf/2004.04136.pdf Yannic's analysis video: https://www.youtube.com/watch?v=hg2Q_O5b9w4  #machinelearning #reinforcementlearning #curl #timscarfe #yannickilcher #connorshorten Music credit; https://soundcloud.com/errxrmusic/in-my-mind
undefined
Apr 24, 2020 • 1h 13min

Exploring Open-Ended Algorithms: POET

Three YouTubers; Tim Scarfe - Machine Learning Dojo (https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA), Connor Shorten - Henry AI Labs (https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw) and Yannic Kilcher (https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfew). We made a new YouTube channel called Machine Learning Street Talk. Every week we will talk about the latest and greatest in AI. Subscribe now! Special guests this week; Dr. Mathew Salvaris (https://www.linkedin.com/in/drmathewsalvaris/), Eric Craeymeersch (https://www.linkedin.com/in/ericcraeymeersch/), Dr. Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/),  Dmitri Soshnikov (https://www.linkedin.com/in/shwars/) We discuss the new concept of an open-ended, or "AI-Generating" algorithm. Open-endedness is a class of algorithms which generate problems and solutions to increasingly complex and diverse tasks. These algorithms create their own curriculum of learning. Complex tasks become tractable because they are now the final stepping stone in a lineage of progressions. In many respects, it's better to trust the machine to develop the learning curriculum, because the best curriculum might be counter-intuitive. These algorithms can generate a radiating tree of evolving challenges and solutions just like natural evolution. Evolution has produced an eternity of diversity and complexity and even produced human intelligence as a side-effect! Could AI-generating algorithms be the next big thing in machine learning? Wang, Rui, et al. "Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions." arXiv preprint arXiv:2003.08536 (2020). https://arxiv.org/abs/2003.08536 Wang, Rui, et al. "Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions." arXiv preprint arXiv:1901.01753 (2019). https://arxiv.org/abs/1901.01753 Watch Yannic’s video on POET: https://www.youtube.com/watch?v=8wkgDnNxiVs and on the extended POET: https://youtu.be/gbG1X8Xq-T8 Watch Connor’s video https://www.youtube.com/watch?v=jxIkPxkN10U UberAI labs video: https://www.youtube.com/watch?v=RX0sKDRq400    #reinforcementlearning #machinelearning #uber #deeplearning #rl #timscarfe #connorshorten #yannickilcher

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode