NLP Highlights cover image

75 - Reinforcement / Imitation Learning in NLP, with Hal Daumé III

NLP Highlights

00:00

Reslope: An Inverse Reinforcement Learning Approach

In inverse reinforcement learning, I assume that the agent who is executing this behavior is sort of near optimal for some reward function. And then I try to reverse engineer what that reward function was. In reslope, the data with which we're doing this is this reward that you only get at the end. So if you take a lot of standard reinforcement learning algorithms and you force them to only observe reward at the end rather than observe incremental reward as they go along, it makes the problem much harder.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app