Fine-tuning and Preference Alignment in a Single Streamlined Process

Jun 13, 2024

Jiwoo Hong and Noah Lee from KAIST AI discuss their method ORPO, combining supervised fine-tuning and preference alignment in a single step. They highlight the advantages of their approach, such as minimal data requirement, bias prevention, and enhanced adaptability of language models. The Orpo method has received positive feedback from the research community and industry for efficient alignment and scaling models with smaller datasets.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 2min

Streamlined Preference Alignment with Orpo: Multinomial Logistic Models in Deep Learning

02:22 • 20min

Analysis of Reaction to Orpo Method in Research Community and Industry

22:26 • 3min

Discussion on Industry Collaborations, Oracle Datasets, and Preference Alignment

25:05 • 4min

Efficient Alignment with Orpo Method and Scalability Testing

29:32 • 4min

Exploring Modalities and Scaling ML Platforms

33:33 • 2min