Generally Intelligent cover image

Episode 22: Archit Sharma, Stanford, on unsupervised and autonomous reinforcement learning

Generally Intelligent

CHAPTER

Is There a Master Objective in a Language Model?

Some of this reinforcement learning with human feedback that they do on the large language models that like anthropic construing, for example, does actually seem to make the personality chronicle more consistent. Like they're doing RLA for helpfulness, I think is the primary thing. And so then the model does kind of become more helpful as a person. That's really interesting. Yeah, like humans are not good at dealing with super high variance personalities. So yeah, I mean, it would be just like find consistent models, even if they're wrong, but at least they're wrong in ways that we know about.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner