TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

00:00

Web Assisted Question Answering With Human Feedback

The noise isn't the thing that worries me the most. It's more that there are sometimes consistent biases that people have. For example, in settings like question answering or settings where you have a model writing some text, often people prefer longer answers. You end up with these very verbose answers. I think it's going to be challenging to fully fix it but I think a big part of this story involves retrieval and having models write answers that contain citations. Citations that trusted sources.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app