

Scaling Multi-Modal Generative AI with Luke Zettlemoyer - #650
18 snips Oct 9, 2023
In this discussion, Luke Zettlemoyer, a University of Washington professor and Meta research manager, dives into the fascinating realm of multimodal generative AI. He highlights the transformative impact of integrating text and images, illustrating advancements like DALL-E 3. Zettlemoyer explains the significance of open science for AI development and the complexities of data in enhancing model performance. Topics also include the role of self-alignment in training and the future of multimodal AI amidst rising technology costs and the need for better assessment methods.
AI Snips
Chapters
Transcript
Episode notes
LLMs as Complex Systems
- LLMs have shifted AI research from engineering to complex systems science.
- Researchers now study emergent behaviors of LLMs, similar to natural scientists.
Data Limitations and Multimodal Models
- Text-only LLMs will eventually run out of training data.
- Multimodal models offer a solution by incorporating diverse data like images and videos.
Dolly 3 and Multimodal Learning
- Dolly 3's image generation demonstrates a deeper understanding of spatial relationships and compositionality in text.
- This suggests that multimodal models learn information not present in text alone.