TalkRL: The Reinforcement Learning Podcast

Robin Ranjit Singh Chauhan
undefined
Feb 12, 2024 • 41min

Sharath Chandra Raparthy

Sharath Chandra Raparthy, an AI Resident at FAIR at Meta, discusses in-context learning for sequential decision tasks, training models to adapt to unseen tasks and randomized environments, properties of data for in-context learning, burstiness and trajectories in transformers, and the use of G flow nets in sampling from complex distributions.
undefined
Nov 13, 2023 • 57min

Pierluca D'Oro and Martin Klissarov

Pierluca D'Oro and Martin Klissarov discuss their recent work on 'Motif, Intrinsic Motivation from AI Feedback' and its application in NetHack. They also explore the similarities between RL and Learning from Preferences, the challenges of training an RL agent for NetHack, the gap between RL and language models, and the difference between return and loss landscapes in RL.
undefined
Aug 22, 2023 • 1h 14min

Martin Riedmiller

Martin Riedmiller, a research scientist and team lead at DeepMind, discusses using reinforcement learning to control the magnetic field in a fusion reactor. They explore challenges in the TOCOMAC project, reward design, designing actor and critic networks, DQN and NFQ algorithms, the importance of explainability in RL systems, and the horde architecture for collecting experience.
undefined
Aug 8, 2023 • 1h 10min

Max Schwarzer

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   Featured References Bigger, Better, Faster: Human-level Atari with human-level efficiency  Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  The Primacy Bias in Deep Reinforcement Learning Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  Additional References   Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017  When to use parametric models in reinforcement learning? Hasselt et al 2019 Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020  Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021  
undefined
Jul 25, 2023 • 40min

Julian Togelius

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai  Featured References  Choose Your Weapon: Survival Strategies for Depressed AI AcademicsJulian Togelius, Georgios N. YannakakisLearning Controllable 3D Level GeneratorsZehua Jiang, Sam Earle, Michael Cerny Green, Julian TogeliusPCGRL: Procedural Content Generation via Reinforcement LearningAhmed Khalifa, Philip Bontrager, Sam Earle, Julian TogeliusIlluminating Generalization in Deep Reinforcement Learning through Procedural Level GenerationNiels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi
undefined
9 snips
May 8, 2023 • 1h 4min

Jakob Foerster

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  Jakob Foerster is an Associate Professor at University of Oxford.  Featured References  Learning with Opponent-Learning Awareness Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  Model-Free Opponent Shaping Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  Off-Belief Learning Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  Learning to Communicate with Deep Multi-Agent Reinforcement Learning Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  Adversarial Cheap Talk Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  Additional References  Lectures by Jakob on youtube 
undefined
5 snips
Apr 12, 2023 • 45min

Danijar Hafner 2

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  Featured References   Mastering Diverse Domains through World Models [ blog ] DreaverV3 Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  DayDreamer: World Models for Physical Robot Learning [ blog ]  Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel  Deep Hierarchical Planning from Pixels [ blog ]  Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   Action and Perception as Divergence Minimization [ blog ]  Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess  Additional References  Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  Planning to Explore via Self-Supervised World Models ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  
undefined
Mar 27, 2023 • 1h 11min

Jeff Clune

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  Featured References  Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ] Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune  Robots that can adapt like animals Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret  Illuminating search spaces by mapping elites Jean-Baptiste Mouret, Jeff Clune  Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley  First return, then explore Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune
undefined
Mar 14, 2023 • 46min

Natasha Jaques 2

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine  Additional References  Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  
undefined
Mar 7, 2023 • 1h 7min

Jacob Beck and Risto Vuorio

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.    Featured Reference   A Survey of Meta-Reinforcement LearningJacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   Additional References  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, Luisa Zintgraf et al  Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al    Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al  Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al  RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al  Learning to reinforcement learn, Wang et al  

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app