
75 - Reinforcement / Imitation Learning in NLP, with Hal Daumé III
NLP Highlights
00:00
Reslope: An Inverse Reinforcement Learning Approach
In inverse reinforcement learning, I assume that the agent who is executing this behavior is sort of near optimal for some reward function. And then I try to reverse engineer what that reward function was. In reslope, the data with which we're doing this is this reward that you only get at the end. So if you take a lot of standard reinforcement learning algorithms and you force them to only observe reward at the end rather than observe incremental reward as they go along, it makes the problem much harder.
Transcript
Play full episode