How to Adapt a Pre-Trained Language Model to a Conditional Task

If the controller had some kind of different higher level action space or something like that, then maybe there would be less overfitting to the reward function. I think the action space is a good point. So if you decouple the two, you're essentially operating in a smaller action space where if a train that you've joined, you have a much larger action space. And then in your thesis, I guess this was pretty kind of large scale pre-trained models like BERT and GPT and things like that. But actually, like now that you're describing this, it seems like it could be useful in that sense where if we have a really large pre-trained model, then it

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app