

Pierluca D'Oro and Martin Klissarov
Nov 13, 2023
Pierluca D'Oro and Martin Klissarov discuss their recent work on 'Motif, Intrinsic Motivation from AI Feedback' and its application in NetHack. They also explore the similarities between RL and Learning from Preferences, the challenges of training an RL agent for NetHack, the gap between RL and language models, and the difference between return and loss landscapes in RL.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 2min
Exploring Intrinsic Rewards in Reinforcement Learning
01:48 • 5min
Similarities between Reinforcement Learning and Learning from Preferences
06:35 • 2min
Challenges of Training an RL Agent for NetHack
08:50 • 12min
Analyzing the Gap between RL and Language Models
20:55 • 31min
Return Landscape versus Loss Landscape in Reinforcement Learning
51:39 • 6min