
[08] He He - Sequential Decisions and Predictions in NLP
The Thesis Review
Coaching: A Way to Improve Text Generation
The idea of coaching is, so in this case, can I maybe sort of project the oracle's policy into the learner's hypothesis? It's not really a projection. It's more like interpolation, but the intuition is you want to derive something that's ritual by the learner that are not too hard in the sense. And it's also similar to this trust region based message where you want to interpolate with the old policy so that you don't take a step away from your current policy.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.