AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Different Applications of Rl in Language Models
One thing that strikes me about these research projects is they seem very different to an idea of just like have a eyes learn values kind of the way humans do right so like there's this supervised oversight thing with um you know using some models to help to leverage us leverage to understand what other models are doing. i generally think that the current paradigm of self-supervised plus rl is like already to a great degree doing like what you should from a shard theory perspective but i kind of like think that's actually not like that useful or i think it kind of happens by default sort of in the current paradigm.