
[08] He He - Sequential Decisions and Predictions in NLP
The Thesis Review
00:00
Coaching: A Way to Improve Text Generation
The idea of coaching is, so in this case, can I maybe sort of project the oracle's policy into the learner's hypothesis? It's not really a projection. It's more like interpolation, but the intuition is you want to derive something that's ritual by the learner that are not too hard in the sense. And it's also similar to this trust region based message where you want to interpolate with the old policy so that you don't take a step away from your current policy.
Transcript
Play full episode