The Language Model Is a Language Model That Works Like Humans

The anchor policies, those are dialogue conditional. They always mix the anchor policy with the reinforcement learned or with the computed policy in order to get a model that performs both well and like humans. So from here, the dialogue comes into this model and then that information goes up here. But that's very, very indirect. It's essentially the only information that the planning has about the action is what would a human do in this situationGiven this board and this dialogue, right? That's the only information you have about the dialogue. You don't have the input dialogue directly and your actions aren't including what dialogue you're going to send. Here is the only at the output of this planning module

Play episode from 19:10

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app