
John Schulman
TalkRL: The Reinforcement Learning Podcast
Web Assisted Question Answering With Human Feedback
The noise isn't the thing that worries me the most. It's more that there are sometimes consistent biases that people have. For example, in settings like question answering or settings where you have a model writing some text, often people prefer longer answers. You end up with these very verbose answers. I think it's going to be challenging to fully fix it but I think a big part of this story involves retrieval and having models write answers that contain citations. Citations that trusted sources.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.