TalkRL: The Reinforcement Learning Podcast

Robin Ranjit Singh Chauhan
undefined
May 13, 2021 • 58min

Marc G. Bellemare

Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. Featured References The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Additional References CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare Amii AI Seminar Series:  Autonomous nav of stratospheric balloons using RL, Marlos C. Machado UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons TalkRL: Marlos C. Machado, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth Hyperbolic discounting and learning over multiple horizons, Fedus et al 2019 Marc G. Bellemare on Twitter 
undefined
May 8, 2021 • 1h 19min

Robert Osazuwa Ness

Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI.  He holds a PhD in statistics.  He studied at Johns Hopkins SAIS and then Purdue University. References Altdeep School of AI, Altdeep on Twitch, Substack, Robert Ness Altdeep Causal Generative Machine Learning Minicourse, Free course Robert Osazuwa Ness on Google Scholar Gamalon Inc Causal Reinforcement Learning talks, Elias Bareinboim The Bitter Lesson, Rich Sutton 2019 The Need for Biases in Learning Generalizations, Tom Mitchell 1980 Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics, Kansky et al 2017 
undefined
Apr 12, 2021 • 1h 32min

Marlos C. Machado

Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. Featured References Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video ] Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare Efficient Exploration in Reinforcement Learning through Time-Based Representations Marlos C. Machado A Laplacian Framework for Option Discovery in Reinforcement Learning [ video ] Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling Eigenoption Discovery through the Deep Successor Representation Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell Exploration in Reinforcement Learning with Deep Covering Options Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris Autonomous navigation of stratospheric balloons using reinforcement learning Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang Generalization and Regularization in DQN Jesse Farebrother, Marlos C. Machado, Michael Bowling Additional References Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL State of the Art Control of Atari Games Using Shallow Reinforcement Learning, Liang et al Introspective Agents: Confidence Measures for General Value Functions, Sherstan et al 
undefined
Mar 22, 2021 • 51min

Nathan Lambert

Nathan Lambert is a PhD Candidate at UC Berkeley. Featured References Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra Objective Mismatch in Model-based Reinforcement Learning Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra Additional References Nathan Lambert's blog Nathan Lambert on Google scholar 
undefined
Mar 16, 2021 • 46min

Kai Arulkumaran

Kai Arulkumaran is a researcher at Araya in Tokyo. Featured References AlphaStar: An Evolutionary Computation Perspective Kai Arulkumaran, Antoine Cully, Julian Togelius Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath Training Agents using Upside-Down Reinforcement Learning Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber Additional References Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. 
undefined
Jan 26, 2021 • 1h 1min

Michael Dennis

Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell. I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   --Michael Dennis Featured References Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED] Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine Videos Adversarial Policies: Attacking Deep Reinforcement Learning Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell Homepage and Videos Accumulating Risk Capital Through Investing in Cooperation Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell Quantifying Differences in Reward Functions [EPIC] Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike Additional References Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019 
undefined
Jan 11, 2021 • 42min

Roman Ring

Roman Ring is a Research Engineer at DeepMind. Featured References Grandmaster level in StarCraft II using multi-agent reinforcement learning Vinyals et al, 2019 Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods Roman Ring, 2018 Additional References Relational Deep Reinforcement Learning,  Zambaldi et al 2018 StarCraft II: A New Challenge for Reinforcement Learning, Vinyals et al 2017 Safe and Efficient Off-Policy Reinforcement Learning [Retrace(λ)], Munos et al 2016 Sample Efficient Actor-Critic with Experience Replay [ACER], Wang et al 2016 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [IMPALA/V-trace], Espeholt et al 2018 
undefined
Dec 6, 2020 • 54min

Shimon Whiteson

Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. Featured References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson Additional References Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al  2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo 
undefined
Sep 21, 2020 • 1h 25min

Aravind Srinivas

Aravind Srinivas, a PhD student at UC Berkeley, discusses the importance of learning better representations of data. They explore Contrastive Predictive Coding (CPC) and its applications, contrastive learning in reinforcement learning, encoding frames, the evolution of unsupervised learning, ongoing projects in the lab, and reinforcement learning papers and industry developments.
undefined
Aug 17, 2020 • 1h 30min

Taylor Killian

Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. Featured References Direct Policy Transfer with Hidden Parameter Markov Decision Processes Yao, Killian, Konidaris, Doshi-Velez Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes Killian, Daulton, Konidaris, Doshi-Velez Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes Killian, Konidaris, Doshi-Velez Counterfactually Guided Policy Transfer in Clinical Settings Killian, Ghassemi, Joshi Additional References Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al 

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app