Latent Space: The AI Engineer Podcast

2024 in Vision [LS Live @ NeurIPS]

31 snips
Dec 22, 2024
In this engaging discussion, Isaac Robinson and Peter Robicheaux from Roboflow share insights on the latest trends and groundbreaking papers in computer vision for 2024. They highlight the shift towards video-based models like 'Sora' and advancements in real-time object detection. Vik Korrapati, founder of Moondream, presents challenges in developing vision language models and introduces a compact, pruned model. Together, they explore how these innovations can reshape the landscape of computer vision and enhance pre-trained model efficiencies.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Vision Model Trends

  • A shift is occurring from image-based models to video-based models.
  • DETRs are now outperforming YOLO models in real-time object detection.
ANECDOTE

Sora's Impact

  • Sora is a groundbreaking model for video generation, producing high-quality, minute-long 1080p videos.
  • It surpasses previous models like MagVIT in quality and length.
ANECDOTE

SAM's Efficiency

  • SAM has significantly reduced labeling time for Roboflow users.
  • SAM2 extends this to video, allowing for object tracking and segmentation.
Get the Snipd Podcast app to discover more snips from this episode
Get the app