
RLHF: A thin line between useful and lobotomized
Interconnects
00:00
Exploring Preference Alignment in Data Sets for Model Evaluation
Exploring the biases within GPT-4 and its influence on data sets used for model evaluation, alongside the role of alternative models like alpaca and vacuna. This chapter discusses credit assignment in sequences, the effects of preference alignment on model outputs, and the variations in training data due to different models.
Transcript
Play full episode