Is There a Master Objective in a Language Model?

Some of this reinforcement learning with human feedback that they do on the large language models that like anthropic construing, for example, does actually seem to make the personality chronicle more consistent. Like they're doing RLA for helpfulness, I think is the primary thing. And so then the model does kind of become more helpful as a person. That's really interesting. Yeah, like humans are not good at dealing with super high variance personalities. So yeah, I mean, it would be just like find consistent models, even if they're wrong, but at least they're wrong in ways that we know about.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app