
Natasha Jaques 2
TalkRL: The Reinforcement Learning Podcast
00:00
Recursive Reward Models - Are We Going to Need That Soon?
In terms of pushing for the boundaries of RLHF, I think what I would bet on and maybe I'm just biased because this is literally my own work, but I would still bet on this idea of trying to get other forms of feedback than just like humans comparing to answers and rate. So I'm not saying my work is the perfect answer, but we were trying to get this type of implicit signal that you're getting during the interaction all the time. And so it doesn't cost anything additional to find it.
Transcript
Play full episode