Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions

Jan 10, 2024

This podcast discusses recent developments in the multimodal space, including the Unified IO 2 model, collecting preference data for images, LLaVA-RLHF experiments, and challenges in multimodal RLHF. They explore the architecture and challenges of multimodal models, the potential of GPT for V in multimodal RLHF, and the use of RLHF technique in multimodal models. They also discuss the importance of clearer terminology and the adoption of synthetic data in this context.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 4min

Exploring the Architecture and Challenges of Multimodal Models

03:30 • 4min

Collecting Preference Data for Images and the Potential of GPT for V in Multimodal RLHF

07:39 • 2min

Multimodal RLHF Fine Tuning: LLAVA, Factually Augmented RLHF, and MM Hal Bench

09:34 • 3min

Exploring RLHF Technique in Multi-modal Models and Challenges in Multi-modal RLHF

12:33 • 3min