TalkRL: The Reinforcement Learning Podcast

Pierluca D'Oro and Martin Klissarov

Nov 13, 2023

Pierluca D'Oro and Martin Klissarov discuss their recent work on 'Motif, Intrinsic Motivation from AI Feedback' and its application in NetHack. They also explore the similarities between RL and Learning from Preferences, the challenges of training an RL agent for NetHack, the gap between RL and language models, and the difference between return and loss landscapes in RL.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Exploring Intrinsic Rewards in Reinforcement Learning

01:48 • 5min

06:35 • 2min

Challenges of Training an RL Agent for NetHack

08:50 • 12min

Analyzing the Gap between RL and Language Models

20:55 • 31min

Return Landscape versus Loss Landscape in Reinforcement Learning

51:39 • 6min