
ChatGPT, Transformers and Attention
The AI Buzz from Lightning AI
00:00
How Human Feedback Trains a Reward Model
Josh asks about human ranking and Luca clarifies humans create rankings to train a reward model used to improve the larger model via RL.
Transcript
Play full episode