AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Reward Model Is a Discriminator in a Reward Learning Loop
The pretrained language model in OpenAI's GPT-3 is used as the policy in a reinforcement learning loop,./nThe reward model is used to simulate human feedback and to fine tune the policy.