The Thesis Review cover image

[08] He He - Sequential Decisions and Predictions in NLP

The Thesis Review

CHAPTER

Coaching: A Way to Improve Text Generation

The idea of coaching is, so in this case, can I maybe sort of project the oracle's policy into the learner's hypothesis? It's not really a projection. It's more like interpolation, but the intuition is you want to derive something that's ritual by the learner that are not too hard in the sense. And it's also similar to this trust region based message where you want to interpolate with the old policy so that you don't take a step away from your current policy.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner