AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Application of RL to Language Models
The strategy model relies on RL. It works in a way that is roughly analogous to the methodology that Noam Brown actually developed in his PhD for playing poker. He essentially took what he had developed for poker and applied it to the partial information game of diplomacy. That's how that later went. We can do things like build up mathematical definitions of deception and manipulation, specify those as things we don't want or do want,. And also potentially even figure out ways to detect them.