TalkRL: The Reinforcement Learning Podcast

Robin Ranjit Singh Chauhan
undefined
Mar 22, 2021 • 51min

Nathan Lambert

Nathan Lambert is a PhD Candidate at UC Berkeley. Featured References Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra Objective Mismatch in Model-based Reinforcement Learning Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra Additional References Nathan Lambert's blog Nathan Lambert on Google scholar 
undefined
Mar 16, 2021 • 46min

Kai Arulkumaran

Kai Arulkumaran is a researcher at Araya in Tokyo. Featured References AlphaStar: An Evolutionary Computation Perspective Kai Arulkumaran, Antoine Cully, Julian Togelius Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath Training Agents using Upside-Down Reinforcement Learning Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber Additional References Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. 
undefined
Jan 26, 2021 • 1h 1min

Michael Dennis

Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell. I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   --Michael Dennis Featured References Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED] Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine Videos Adversarial Policies: Attacking Deep Reinforcement Learning Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell Homepage and Videos Accumulating Risk Capital Through Investing in Cooperation Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell Quantifying Differences in Reward Functions [EPIC] Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike Additional References Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019 
undefined
Jan 11, 2021 • 42min

Roman Ring

Roman Ring is a Research Engineer at DeepMind. Featured References Grandmaster level in StarCraft II using multi-agent reinforcement learning Vinyals et al, 2019 Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods Roman Ring, 2018 Additional References Relational Deep Reinforcement Learning,  Zambaldi et al 2018 StarCraft II: A New Challenge for Reinforcement Learning, Vinyals et al 2017 Safe and Efficient Off-Policy Reinforcement Learning [Retrace(λ)], Munos et al 2016 Sample Efficient Actor-Critic with Experience Replay [ACER], Wang et al 2016 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [IMPALA/V-trace], Espeholt et al 2018 
undefined
Dec 6, 2020 • 54min

Shimon Whiteson

Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. Featured References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson Additional References Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al  2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo 
undefined
Sep 21, 2020 • 1h 25min

Aravind Srinivas

Aravind Srinivas, a PhD student at UC Berkeley, discusses the importance of learning better representations of data. They explore Contrastive Predictive Coding (CPC) and its applications, contrastive learning in reinforcement learning, encoding frames, the evolution of unsupervised learning, ongoing projects in the lab, and reinforcement learning papers and industry developments.
undefined
Aug 17, 2020 • 1h 30min

Taylor Killian

Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. Featured References Direct Policy Transfer with Hidden Parameter Markov Decision Processes Yao, Killian, Konidaris, Doshi-Velez Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes Killian, Daulton, Konidaris, Doshi-Velez Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes Killian, Konidaris, Doshi-Velez Counterfactually Guided Policy Transfer in Clinical Settings Killian, Ghassemi, Joshi Additional References Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al 
undefined
Jul 6, 2020 • 1h 12min

Nan Jiang

Nan Jiang is an Assistant Professor of Computer Science at University of Illinois.  He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. Featured References Reinforcement Learning: Theory and Algorithms Alekh Agarwal Nan Jiang Sham M. Kakade Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford Information-Theoretic Considerations in Batch Reinforcement Learning Jinglin Chen, Nan Jiang  Additional References Towards a Unified Theory of State Abstraction for MDPs, Lihong Li, Thomas J. Walsh, Michael L. Littman  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Nan Jiang, Lihong Li Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization, Nan Jiang, Jiawei Huang Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue Errata [Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters.  What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters. 
undefined
May 14, 2020 • 2h

Danijar Hafner

Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute.  He holds a Masters of Research from University College London. Featured References A deep learning framework for neuroscience Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Dream to Control: Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak Additional ReferencesMastering Atari, Go, Chess and Shogi by Planning with a Learned Model Schrittwieser et al Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Silver et al Shaping Belief States with Generative Environment Models for RL  Gregor et al Model-Based Active Exploration Shyam et al  Errata [Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models" 
undefined
Apr 5, 2020 • 49min

Csaba Szepesvari

Csaba Szepesvari is: Head of the Foundations Team at DeepMind Professor of Computer Science at the University of Alberta Canada CIFAR AI Chair Fellow at the Alberta Machine Intelligence Institute  Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning References Bandit based monte-carlo planning, Levente Kocsis, Csaba Szepesvári Bandit Algorithms, Tor Lattimore, Csaba Szepesvári Algorithms for Reinforcement Learning, Csaba Szepesvári The Predictron: End-To-End Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris A Bayesian framework for reinforcement learning, Strens Solving Rubik’s Cube with a Robot Hand ; Paper, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang The Nonstochastic Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire Deep Learning with Bayesian Principles, Mohammad Emtiyaz Khan Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio 

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app