

Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions
Jan 10, 2024
This podcast discusses recent developments in the multimodal space, including the Unified IO 2 model, collecting preference data for images, LLaVA-RLHF experiments, and challenges in multimodal RLHF. They explore the architecture and challenges of multimodal models, the potential of GPT for V in multimodal RLHF, and the use of RLHF technique in multimodal models. They also discuss the importance of clearer terminology and the adoption of synthetic data in this context.
Chapters
Transcript
Episode notes
1 2 3 4 5
Introduction
00:00 • 4min
Exploring the Architecture and Challenges of Multimodal Models
03:30 • 4min
Collecting Preference Data for Images and the Potential of GPT for V in Multimodal RLHF
07:39 • 2min
Multimodal RLHF Fine Tuning: LLAVA, Factually Augmented RLHF, and MM Hal Bench
09:34 • 3min
Exploring RLHF Technique in Multi-modal Models and Challenges in Multi-modal RLHF
12:33 • 3min