
Episode 22: Archit Sharma, Stanford, on unsupervised and autonomous reinforcement learning
Generally Intelligent
Is There a Master Objective in a Language Model?
Some of this reinforcement learning with human feedback that they do on the large language models that like anthropic construing, for example, does actually seem to make the personality chronicle more consistent. Like they're doing RLA for helpfulness, I think is the primary thing. And so then the model does kind of become more helpful as a person. That's really interesting. Yeah, like humans are not good at dealing with super high variance personalities. So yeah, I mean, it would be just like find consistent models, even if they're wrong, but at least they're wrong in ways that we know about.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.