Latent Space: The AI Engineer Podcast

How to train your own Large Multimodal Model — with Hugo Laurençon & Leo Tronchon of HuggingFace M4

46 snips
Jan 19, 2024
Hugo Laurençon and Leo Tronchon from Hugging Face discuss their cutting-edge work on multimodal models like IDEFICS and OBELICS. They dive into the evolution of multimodal training, sharing challenges related to data quality and the intricacies of processing raw HTML. The conversation highlights the importance of image resolution for OCR and the hurdles faced in video data processing. Both researchers express optimism for open-source models, aiming for enhanced performance while tackling issues like hallucinations. Their insights reveal a bright future for multimodal AI innovation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Hugging Face's Open-Source Incentive

  • Hugging Face is incentivized to release the best open-source models.
  • Other companies prioritize keeping the best models in-house for commercial advantage.
ANECDOTE

Internal Tool Usage and Testing

  • The Hugging Face team uses their own tools like Datasets, Transformers, and Diffusers for research.
  • Obelix's creation tested Datasets' scalability with its massive dataset.
ADVICE

Efficient Data Handling

  • Use the Datasets library for managing large datasets efficiently.
  • It simplifies filtering, mapping, parallel operations, and handling data exceeding memory capacity.
Get the Snipd Podcast app to discover more snips from this episode
Get the app