
[08] He He - Sequential Decisions and Predictions in NLP
The Thesis Review
How to Adapt a Pre-Trained Language Model to a Conditional Task
If the controller had some kind of different higher level action space or something like that, then maybe there would be less overfitting to the reward function. I think the action space is a good point. So if you decouple the two, you're essentially operating in a smaller action space where if a train that you've joined, you have a much larger action space. And then in your thesis, I guess this was pretty kind of large scale pre-trained models like BERT and GPT and things like that. But actually, like now that you're describing this, it seems like it could be useful in that sense where if we have a really large pre-trained model, then it
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.