
TalkRL: The Reinforcement Learning Podcast
TalkRL podcast is All Reinforcement Learning, All the Time.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.
Latest episodes

Dec 6, 2020 • 54min
Shimon Whiteson
Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. Featured References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson Additional References Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al 2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo

Sep 21, 2020 • 1h 25min
Aravind Srinivas
Aravind Srinivas, a PhD student at UC Berkeley, discusses the importance of learning better representations of data. They explore Contrastive Predictive Coding (CPC) and its applications, contrastive learning in reinforcement learning, encoding frames, the evolution of unsupervised learning, ongoing projects in the lab, and reinforcement learning papers and industry developments.

Aug 17, 2020 • 1h 30min
Taylor Killian
Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. Featured References Direct Policy Transfer with Hidden Parameter Markov Decision Processes Yao, Killian, Konidaris, Doshi-Velez Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes Killian, Daulton, Konidaris, Doshi-Velez Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes Killian, Konidaris, Doshi-Velez Counterfactually Guided Policy Transfer in Clinical Settings Killian, Ghassemi, Joshi Additional References Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al

Jul 6, 2020 • 1h 12min
Nan Jiang
Nan Jiang is an Assistant Professor of Computer Science at University of Illinois. He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. Featured References Reinforcement Learning: Theory and Algorithms Alekh Agarwal Nan Jiang Sham M. Kakade Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford Information-Theoretic Considerations in Batch Reinforcement Learning Jinglin Chen, Nan Jiang Additional References Towards a Unified Theory of State Abstraction for MDPs, Lihong Li, Thomas J. Walsh, Michael L. Littman Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Nan Jiang, Lihong Li Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization, Nan Jiang, Jiawei Huang Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue Errata [Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters. What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters.

May 14, 2020 • 2h
Danijar Hafner
Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute. He holds a Masters of Research from University College London. Featured References A deep learning framework for neuroscience Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Dream to Control: Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak Additional ReferencesMastering Atari, Go, Chess and Shogi by Planning with a Learned Model Schrittwieser et al Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Silver et al Shaping Belief States with Generative Environment Models for RL Gregor et al Model-Based Active Exploration Shyam et al Errata [Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models"

Apr 5, 2020 • 49min
Csaba Szepesvari
Csaba Szepesvari is: Head of the Foundations Team at DeepMind Professor of Computer Science at the University of Alberta Canada CIFAR AI Chair Fellow at the Alberta Machine Intelligence Institute Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning References Bandit based monte-carlo planning, Levente Kocsis, Csaba Szepesvári Bandit Algorithms, Tor Lattimore, Csaba Szepesvári Algorithms for Reinforcement Learning, Csaba Szepesvári The Predictron: End-To-End Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris A Bayesian framework for reinforcement learning, Strens Solving Rubik’s Cube with a Robot Hand ; Paper, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang The Nonstochastic Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire Deep Learning with Bayesian Principles, Mohammad Emtiyaz Khan Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

Mar 30, 2020 • 49min
Ben Eysenbach
Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University. He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop. Featured References Diversity is All You Need: Learning Skills without a Reward Function Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine Search on the Replay Buffer: Bridging Planning and Reinforcement Learning Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine Additional References Behaviour Suite for Reinforcement Learning, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt Learning Latent Plans from Play, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet Finale Doshi-Velez Emma Brunskill Closed-loop optimization of fast-charging protocols for batteries with machine learning, Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh CMU 10-703 Deep Reinforcement Learning, Fall 2019, Carnegie Mellon University ICML Exploration in Reinforcement Learning workshop

Dec 20, 2019 • 56min
NeurIPS 2019 Deep RL Workshop
Thank you to all the presenters that participated. I covered as many as I could given the time and crowds, if you were not included and wish to be, please email talkrl@pathwayi.com More details on the official NeurIPS Deep RL Workshop site. 0:23 Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) [external pdf link] 4:16 Single Deep Counterfactual Regret Minimization; Eric Steinberger (University of Cambridge). 5:38 On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). 9:33 Objective Mismatch in Model-based Reinforcement Learning; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). 10:51 Option Discovery using Deep Skill Chaining; Akhil Bagaria (Brown University); George Konidaris (Brown University). 13:44 Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). 14:52 LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). 16:30 Accelerating Training in Pommerman with Imitation and Reinforcement Learning; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). 17:27 Dream to Control: Learning Behaviors by Latent Imagination; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) [external pdf link]. 20:48 Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning; Seungchan Kim (Brown University); George Konidaris (Brown). 22:05 Meta-learning curiosity algorithms; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). 24:09 Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). 25:44 Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). 26:35 Multiplayer AlphaZero; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) [external pdf link]. 27:43 Prioritized Sequence Experience Replay; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) [external pdf link]. 29:14 Recurrent neural-linear posterior sampling for non-stationary bandits; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). 29:36 Improving Evolutionary Strategies With Past Descent Directions; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) [external pdf link]. 31:40 ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) [external pdf link]. 33:05 Bottom-Up Meta-Policy Search; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) [external pdf link]. 33:37 MERL: Multi-Head Reinforcement Learning; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) [external pdf link]. 35:30 Emergen...

Nov 19, 2019 • 48min
Scott Fujimoto
Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning. Featured References Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto, David Meger, Doina Precup Benchmarking Batch Deep Reinforcement Learning Algorithms Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau Additional References Striving for Simplicity in Off-Policy Deep Reinforcement Learning Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Continuous control with deep reinforcement learning Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra Distributed Distributional Deterministic Policy Gradients Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

Nov 12, 2019 • 1h 4min
Jessica Hamrick
Dr. Jessica Hamrick is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley. Featured References Structured agents for physical construction Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick Analogues of mental simulation and imagination in deep learning Jessica Hamrick Additional References Metacontrol for Adaptive Imagination-Based Optimization Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia Surprising Negative Results for Generative Adversarial Tree Search Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar Metareasoning and Mental Simulation Jessica B. Hamrick Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis Object-oriented state editing for HRL Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu PILCO: A Model-Based and Data-Efficient Approach to Policy Search Marc Peter Deisenroth, Carl Edward Rasmussen Blueberry Earth Anders Sandberg
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.