The Thesis Review cover image

[08] He He - Sequential Decisions and Predictions in NLP

The Thesis Review

00:00

Coaching: A Way to Improve Text Generation

The idea of coaching is, so in this case, can I maybe sort of project the oracle's policy into the learner's hypothesis? It's not really a projection. It's more like interpolation, but the intuition is you want to derive something that's ritual by the learner that are not too hard in the sense. And it's also similar to this trust region based message where you want to interpolate with the old policy so that you don't take a step away from your current policy.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app