
Big Data, Reinforcement Learning and Aligning Models
The AI Buzz from Lightning AI
00:00
How are model outputs evaluated and ranked?
Luca explains ranking sampled outputs, training a reward model from human rankings, and using it for alignment.
Play episode from 20:34
Transcript


