
Udio & the age of multi-modal AI (Practical AI #265)
Changelog Master Feed
00:00
Evolution of Multimodal Functionality in AI
The chapter explores the progress in multi-modal AI, from specialized models to versatile models like GPT Vision and visual instruction tuning, emphasizing the combination of text and visual data inputs. It discusses advancements in training a projection matrix to merge different model architectures for tasks like visual question answering and automated reasoning over images.
Transcript
Play full episode