

TalkRL: The Reinforcement Learning Podcast
Robin Ranjit Singh Chauhan
TalkRL podcast is All Reinforcement Learning, All the Time.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.
Episodes
Mentioned books

May 13, 2021 • 58min
Marc G. Bellemare
Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. Featured References The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Additional References CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare Amii AI Seminar Series: Autonomous nav of stratospheric balloons using RL, Marlos C. Machado UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons TalkRL: Marlos C. Machado, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth Hyperbolic discounting and learning over multiple horizons, Fedus et al 2019 Marc G. Bellemare on Twitter

May 8, 2021 • 1h 19min
Robert Osazuwa Ness
Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI. He holds a PhD in statistics. He studied at Johns Hopkins SAIS and then Purdue University. References Altdeep School of AI, Altdeep on Twitch, Substack, Robert Ness Altdeep Causal Generative Machine Learning Minicourse, Free course Robert Osazuwa Ness on Google Scholar Gamalon Inc Causal Reinforcement Learning talks, Elias Bareinboim The Bitter Lesson, Rich Sutton 2019 The Need for Biases in Learning Generalizations, Tom Mitchell 1980 Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics, Kansky et al 2017

Apr 12, 2021 • 1h 32min
Marlos C. Machado
Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. Featured References Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video ] Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare Efficient Exploration in Reinforcement Learning through Time-Based Representations Marlos C. Machado A Laplacian Framework for Option Discovery in Reinforcement Learning [ video ] Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling Eigenoption Discovery through the Deep Successor Representation Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell Exploration in Reinforcement Learning with Deep Covering Options Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Generalization and Regularization in DQN Jesse Farebrother, Marlos C. Machado, Michael Bowling Additional References Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL State of the Art Control of Atari Games Using Shallow Reinforcement Learning, Liang et al Introspective Agents: Confidence Measures for General Value Functions, Sherstan et al

Mar 22, 2021 • 51min
Nathan Lambert
Nathan Lambert is a PhD Candidate at UC Berkeley. Featured References Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra Objective Mismatch in Model-based Reinforcement Learning Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra Additional References Nathan Lambert's blog Nathan Lambert on Google scholar

Mar 16, 2021 • 46min
Kai Arulkumaran
Kai Arulkumaran is a researcher at Araya in Tokyo. Featured References AlphaStar: An Evolutionary Computation Perspective Kai Arulkumaran, Antoine Cully, Julian Togelius Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath Training Agents using Upside-Down Reinforcement Learning Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber Additional References Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning.

Jan 26, 2021 • 1h 1min
Michael Dennis
Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell. I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial. --Michael Dennis Featured References Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED] Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine Videos Adversarial Policies: Attacking Deep Reinforcement Learning Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell Homepage and Videos Accumulating Risk Capital Through Investing in Cooperation Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell Quantifying Differences in Reward Functions [EPIC] Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike Additional References Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019

Jan 11, 2021 • 42min
Roman Ring
Roman Ring is a Research Engineer at DeepMind. Featured References Grandmaster level in StarCraft II using multi-agent reinforcement learning Vinyals et al, 2019 Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods Roman Ring, 2018 Additional References Relational Deep Reinforcement Learning, Zambaldi et al 2018 StarCraft II: A New Challenge for Reinforcement Learning, Vinyals et al 2017 Safe and Efficient Off-Policy Reinforcement Learning [Retrace(λ)], Munos et al 2016 Sample Efficient Actor-Critic with Experience Replay [ACER], Wang et al 2016 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [IMPALA/V-trace], Espeholt et al 2018

Dec 6, 2020 • 54min
Shimon Whiteson
Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. Featured References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson Additional References Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al 2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo

Sep 21, 2020 • 1h 25min
Aravind Srinivas
Aravind Srinivas, a PhD student at UC Berkeley, discusses the importance of learning better representations of data. They explore Contrastive Predictive Coding (CPC) and its applications, contrastive learning in reinforcement learning, encoding frames, the evolution of unsupervised learning, ongoing projects in the lab, and reinforcement learning papers and industry developments.

Aug 17, 2020 • 1h 30min
Taylor Killian
Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. Featured References Direct Policy Transfer with Hidden Parameter Markov Decision Processes Yao, Killian, Konidaris, Doshi-Velez Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes Killian, Daulton, Konidaris, Doshi-Velez Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes Killian, Konidaris, Doshi-Velez Counterfactually Guided Policy Transfer in Clinical Settings Killian, Ghassemi, Joshi Additional References Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al