AXRP - the AI X-risk Research Podcast cover image

16 - Preparing for Debate AI with Geoffrey Irving

AXRP - the AI X-risk Research Podcast

CHAPTER

Uncertainly Estimation for Language Reward Models

The first author is adam gleve, and you are the second and final author in this paper. The goal here was, we are doing rward modelling for doing, again, this sot of arel with human preferences task for language models. So it semtat that is changing the task of the war model pretty substantially. You're giving it a lot more context. In the same way, you're helping the human and hopefully, risely, helping the ward bottlece s next thing i want to talk about is this paper, uncertainty estimation for language reward models.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner