AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Uncertainly Estimation for Language Reward Models
The first author is adam gleve, and you are the second and final author in this paper. The goal here was, we are doing rward modelling for doing, again, this sot of arel with human preferences task for language models. So it semtat that is changing the task of the war model pretty substantially. You're giving it a lot more context. In the same way, you're helping the human and hopefully, risely, helping the ward bottlece s next thing i want to talk about is this paper, uncertainty estimation for language reward models.