AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is a Language Model a Model?
We previously spoke a little bit about one shortcoming of this preference model and the RLH of approach as being not aware of time and sequence and that kind of thing. Is the problem that you just described? Is it kind of another angle on the same problem or are those kind of two separate shortcomings of the way RL has been applied to language models thus far? I think we've only scratched the surface of the potential for that because right now the preference of stuff is really trying to get these models to act more like people, to sort of mimic people. But I think they can do a lot more than that. They can actually use their deep understanding of the patterns in human behavior to be