Recursive Reward Models - Are We Going to Need That Soon?

In terms of pushing for the boundaries of RLHF, I think what I would bet on and maybe I'm just biased because this is literally my own work, but I would still bet on this idea of trying to get other forms of feedback than just like humans comparing to answers and rate. So I'm not saying my work is the perfect answer, but we were trying to get this type of implicit signal that you're getting during the interaction all the time. And so it doesn't cost anything additional to find it.

Play episode from 15:18

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app