Latent Space: The AI Engineer Podcast cover image

How to train your own Large Multimodal Model — with Hugo Laurençon & Leo Tronchon of HuggingFace M4

Latent Space: The AI Engineer Podcast

00:00

2024: The Year of Multi-Modality in AI Engineering

In 2024, OpenAI's Logan pronounced it as the year of multi-modality, with a focus on multi-modal models, including capabilities for vision, image generation, and deeper dives into multi-modality. OpenAI is working on multi-modal capabilities that have been incorporated in chat GBT and the iOS app. Additionally, Dolly 3 is expected to take image generation to the next level. The latent space podcast plans to offer deeper dives into multi-modality in 2024, and hugging face has trained an open-source reproduction of DeepMind's model with significant parameters.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app