TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

CHAPTER

Web Assisted Question Answering With Human Feedback

The noise isn't the thing that worries me the most. It's more that there are sometimes consistent biases that people have. For example, in settings like question answering or settings where you have a model writing some text, often people prefer longer answers. You end up with these very verbose answers. I think it's going to be challenging to fully fix it but I think a big part of this story involves retrieval and having models write answers that contain citations. Citations that trusted sources.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner