Latent Space: The AI Engineer Podcast

SAM 3: The Eyes for AI — Nikhila & Pengchuan (Meta Superintelligence), ft. Joseph Nelson (Roboflow)

405 snips
Dec 18, 2025
Nikhila Ravi leads the Segment Anything project at Meta, with Pengchuan Zhang contributing as a researcher specializing in vision models. They discuss the groundbreaking SAM 3, which enables concept segmentation using natural language prompts. The conversation dives into the impressive real-time performance, the massive SACO benchmark of over 200k concepts, and how SAM 3 revolutionizes data annotation—reducing time from two minutes to just 25 seconds. Joseph Nelson from Roboflow shares insights on real-world applications in fields like cancer research and the automation of complex visual reasoning.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
00:00 / 00:00

Concept Prompts Replace Manual Clicks

  • SAM3 introduces concept prompts to find and segment all instances of a category using short natural-language phrases or visual exemplars.
  • This lets users detect and track objects across images and video without clicking every instance manually.
00:00 / 00:00

Latency Scales With Objects And GPUs

  • SAM3 runs image inferences in ~30ms on an H200 for large object counts and scales to real-time video with multi‑GPU setups.
  • Video throughput depends on tracked object count and parallel inference across GPUs.
00:00 / 00:00

Benchmarks Expanded To 200k Concepts

  • SAM3 required redefining benchmarks from ~1.2k concepts to SACO's 200k+ concepts to match natural language diversity.
  • This larger benchmark enabled training and evaluating open‑vocabulary, exhaustive segmentation.
Get the Snipd Podcast app to discover more snips from this episode
Get the app