Offline Reinforcement Learning for Text Generation

In many NLP problems, we don't have this luxury. Once you go off the oracle path, you don't know what, what are the best actions to take. So that's a big limitation of this algorithm because it assumes that during training, whichever state you're in, you have access to the article. And then in that case, so you're kind of framing the text generation in terms of the sequential decision making for dagger, you can't necessarily have an oracle.

Play episode from 14:24

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app