
Big Data, Reinforcement Learning and Aligning Models
The AI Buzz from Lightning AI
00:00
How are model outputs evaluated and ranked?
Luca explains ranking sampled outputs, training a reward model from human rankings, and using it for alignment.
Transcript
Play full episode