
[08] He He - Sequential Decisions and Predictions in NLP
The Thesis Review
00:00
Offline Reinforcement Learning for Text Generation
In many NLP problems, we don't have this luxury. Once you go off the oracle path, you don't know what, what are the best actions to take. So that's a big limitation of this algorithm because it assumes that during training, whichever state you're in, you have access to the article. And then in that case, so you're kind of framing the text generation in terms of the sequential decision making for dagger, you can't necessarily have an oracle.
Transcript
Play full episode