
ChatGPT goes prime time!
Practical AI
00:00
The Reward Model Is a Discriminator in a Reward Learning Loop
The pretrained language model in OpenAI's GPT-3 is used as the policy in a reinforcement learning loop,./nThe reward model is used to simulate human feedback and to fine tune the policy.
Transcript
Play full episode