Practical AI cover image

ChatGPT goes prime time!

Practical AI

00:00

The Reward Model Is a Discriminator in a Reward Learning Loop

The pretrained language model in OpenAI's GPT-3 is used as the policy in a reinforcement learning loop,./nThe reward model is used to simulate human feedback and to fine tune the policy.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app