

Fine-tuning and Preference Alignment in a Single Streamlined Process
Jun 13, 2024
Jiwoo Hong and Noah Lee from KAIST AI discuss their method ORPO, combining supervised fine-tuning and preference alignment in a single step. They highlight the advantages of their approach, such as minimal data requirement, bias prevention, and enhanced adaptability of language models. The Orpo method has received positive feedback from the research community and industry for efficient alignment and scaling models with smaller datasets.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Intro
00:00 • 2min
Streamlined Preference Alignment with Orpo: Multinomial Logistic Models in Deep Learning
02:22 • 20min
Analysis of Reaction to Orpo Method in Research Community and Industry
22:26 • 3min
Discussion on Industry Collaborations, Oracle Datasets, and Preference Alignment
25:05 • 4min
Efficient Alignment with Orpo Method and Scalability Testing
29:32 • 4min
Exploring Modalities and Scaling ML Platforms
33:33 • 2min