Data Science at Home

Francesco Gadaleta
undefined
Nov 27, 2019 • 38min

More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful. Don't forget to subscribe to our Newsletter or join the discussion on our Discord server   References Attention is all you need  https://arxiv.org/abs/1706.03762 The illustrated transformer  https://jalammar.github.io/illustrated-transformer Self-attention for generative models  http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
undefined
Nov 18, 2019 • 28min

How to improve the stability of training a GAN (Ep. 88)

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients.   In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three specific strategies that researchers are considering to improve the accuracy and the reliability of GANs.   The most tedious issues of GANs   Convergence to equilibrium   A typical GAN is formed by at least two networks: a generator G and a discriminator D. The generator's task is to generate samples from random noise. In turn, the discriminator has to learn to distinguish fake samples from real ones. While it is theoretically possible that generators and discriminators converge to a Nash Equilibrium (at which both networks are in their optimal state), reaching such equilibrium is not easy.    Vanishing gradients   Moreover, a very accurate discriminator would push the loss function towards lower and lower values. This in turn, might cause the gradient to vanish and the entire network to stop learning completely.    Mode collapse   Another phenomenon that is easy to observe when dealing with GANs is mode collapse. That is the incapability of the model to generate diverse samples. This in turn, leads to generated data that are more and more similar to the previous ones. Hence, the entire generated dataset would be just concentrated around a particular statistical value.    The solution   Researchers have taken into consideration several approaches to overcome such issues. They have been playing with architectural changes, different loss functions and game theory.   Listen to the full episode to know more about the most effective strategies to build GANs that are reliable and robust. Don't forget to join the conversation on our new Discord channel. See you there!  
undefined
Nov 12, 2019 • 20min

What if I train a neural network with random data? (with Stanisław Jastrzębski) (Ep. 87)

What happens to a neural network trained with random data? Are massive neural networks just lookup tables or do they truly learn something?  Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University. Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on  Understanding and improving how deep network generalise Representation Learning Natural Language Processing Computer Aided Drug Design   What makes deep learning unique? I have asked him a few questions for which I was looking for an answer for a long time. For instance, what is deep learning bringing to the table that other methods don’t or are not capable of?  Stan believe that the one thing that makes deep learning special is representation learning. All the other competing methods, be it kernel machines, or random forests, do not have this capability. Moreover, optimisation (SGD) lies at the heart of representation learning in the sense that it allows finding good representations.    What really improves the training quality of a neural network? We discussed about the accuracy of neural networks depending pretty much on how good the Stochastic Gradient Descent method is at finding minima of the loss function. What would influence such minima? Stan's answer has revealed that training set accuracy or loss value is not that interesting actually. It is relatively easy to overfit data (i.e. achieve the lowest loss possible), provided a large enough network, and a large enough computational budget. However, shape of the minima, or performance on validation sets are in a quite fascinating way influenced by optimisation. Optimisation in the beginning of the trajectory, steers such trajectory towards minima of certain properties that go much further than just training accuracy. As always we spoke about the future of AI and the role deep learning will play. I hope you enjoy the show! Don't forget to join the conversation on our new Discord channel. See you there!   References   Homepage of Stanisław Jastrzębski https://kudkudak.github.io/ A Closer Look at Memorization in Deep Networks https://arxiv.org/abs/1706.05394 Three Factors Influencing Minima in SGD https://arxiv.org/abs/1711.04623 Don't Decay the Learning Rate, Increase the Batch Size https://arxiv.org/abs/1711.00489 Stiffness: A New Perspective on Generalization in Neural Networks https://arxiv.org/abs/1901.09491
undefined
Nov 5, 2019 • 45min

Deeplearning is easier when it is illustrated (with Jon Krohn) (Ep. 86)

In this episode I speak with Jon Krohn, author of Deeplearning Illustrated a book that makes deep learning easier to grasp.  We also talk about some important guidelines to take into account whenever you implement a deep learning model, how to deal with bias in machine learning used to match jobs to candidates and the future of AI.      You can purchase the book from informit.com/dsathome with code DSATHOME and get 40% off books/eBooks and 60% off video training
undefined
Nov 4, 2019 • 15min

[RB] How to generate very large images with GANs (Ep. 85)

Join the discussion on our Discord server In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images. Enjoy the show!   References Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376
undefined
Oct 27, 2019 • 38min

More powerful deep learning with transformers (Ep. 84)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful. Don't forget to subscribe to our Newsletter or join the discussion on our Discord server   References Attention is all you need  https://arxiv.org/abs/1706.03762 The illustrated transformer  https://jalammar.github.io/illustrated-transformer Self-attention for generative models  http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
undefined
Oct 18, 2019 • 38min

[RB] Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 83)

Join the discussion on our Discord server   In this episode, I am with Aaron Gokaslan, computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published. We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind! Enjoy the show!  References Multimodal image to image translation (not all mentioned in the podcast but recommended by Aaron) Pix2Pix:  https://phillipi.github.io/pix2pix/   CycleGAN: https://junyanz.github.io/CycleGAN/   GANimorph Paper: https://arxiv.org/abs/1808.04325 Code: https://github.com/brownvc/ganimorph   UNIT:https://arxiv.org/abs/1703.00848 MUNIT:https://github.com/NVlabs/MUNIT DRIT: https://github.com/HsinYingLee/DRIT   GPT-2 and related  Try OpenAI's GPT-2: https://talktotransformer.com/ Blogpost: https://blog.usejournal.com/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc The Original Transformer Paper: https://arxiv.org/abs/1706.03762 Grover: The FakeNews generator and detector: https://rowanzellers.com/grover/
undefined
Oct 15, 2019 • 22min

What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord server   After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here. In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?   Are you a listener of Data Science at Home podcast? A reader of the Amethix Blog?  Or did you subscribe to the Artificial Intelligence at your fingertips newsletter? In any case let’s stay in touch!  https://amethix.com/survey/     References Emergence of Locomotion Behaviours in Rich Environments  https://arxiv.org/abs/1707.02286 Rainbow: Combining Improvements in Deep Reinforcement Learning  https://arxiv.org/abs/1710.02298 AlphaGo Zero: Starting from scratch  https://deepmind.com/blog/article/alphago-zero-starting-scratch
undefined
Oct 10, 2019 • 32min

Have you met Shannon? Conversation with Jimmy Soni and Rob Goodman about one of the greatest minds in history (Ep. 81)

Join the discussion on our Discord server   In this episode I have an amazing conversation with Jimmy Soni and Rob Goodman, authors of “A mind at play”, a book entirely dedicated to the life and achievements of Claude Shannon. Claude Shannon does not need any introduction. But for those who need a refresh, Shannon is the inventor of the information age.  Have you heard of binary code, entropy in information theory, data compression theory (the stuff behind mp3, mpg, zip, etc.), error correcting codes (the stuff that makes your RAM work well), n-grams, block ciphers, the beta distribution, the uncertainty coefficient? All that stuff has been invented by Claude Shannon :)    Articles:  https://medium.com/the-mission/10-000-hours-with-claude-shannon-12-lessons-on-life-and-learning-from-a-genius-e8b9297bee8f https://medium.com/the-mission/on-claude-shannons-103rd-birthday-here-are-103-memorable-claude-shannon-quotes-maxims-and-843de4c716cf?source=your_stories_page--------------------------- http://nautil.us/issue/51/limits/how-information-got-re_invented http://nautil.us/issue/50/emergence/claude-shannon-the-las-vegas-cheat   Claude's papers: https://medium.com/the-mission/a-genius-explains-how-to-be-creative-claude-shannons-long-lost-1952-speech-fbbcb2ebe07f http://www.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf   A mind at play (book links):  http://amzn.to/2pasLMz -- Hardcover https://amzn.to/2oCfVL0 -- Audio
undefined
Oct 1, 2019 • 34min

Attacking machine learning for fun and profit (with the authors of SecML Ep. 80)

Join the discussion on our Discord server As ML plays a more and more relevant role in many domains of everyday life, it’s quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Learning (ML) algorithms. Both Ambra and Marco are members of research group PRAlab, under the supervision of Prof. Fabio Roli.   SecML Contributors Marco Melis (Ph.D Student, Project Maintainer, https://www.linkedin.com/in/melismarco/) Ambra Demontis (Postdoc, https://pralab.diee.unica.it/it/AmbraDemontis)  Maura Pintor (Ph.D Student, https://it.linkedin.com/in/maura-pintor) Battista Biggio (Assistant Professor, https://pralab.diee.unica.it/it/BattistaBiggio) References SecML: an open-source Python library for the security evaluation of Machine Learning (ML) algorithms https://secml.gitlab.io/. Demontis et al., “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks,” presented at the 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 321–338. https://www.usenix.org/conference/usenixsecurity19/presentation/demontis W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in International Conference on Machine Learning (ICML), 2017. https://arxiv.org/abs/1703.04730 Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli, “Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 751–759. https://arxiv.org/abs/1708.06939 Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. https://arxiv.org/abs/1712.03141 Biggio et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part III, 2013, vol. 8190, pp. 387–402. https://arxiv.org/abs/1708.06131 Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in 29th Int’l Conf. on Machine Learning, 2012, pp. 1807–1814. https://arxiv.org/abs/1206.6389 Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, 2004, pp. 99–108. https://dl.acm.org/citation.cfm?id=1014066 Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. https://arxiv.org/abs/1703.01365  Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Model-agnostic interpretability of machine learning." arXiv preprint arXiv:1606.05386 (2016). https://arxiv.org/abs/1606.05386 Guo, Wenbo, et al. "Lemna: Explaining deep learning based security applications." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018. https://dl.acm.org/citation.cfm?id=3243792 Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): E0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140 

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app