
Latent Space: The AI Engineer Podcast SAM 3: The Eyes for AI — Nikhila & Pengchuan (Meta Superintelligence), ft. Joseph Nelson (Roboflow)
405 snips
Dec 18, 2025 Nikhila Ravi leads the Segment Anything project at Meta, with Pengchuan Zhang contributing as a researcher specializing in vision models. They discuss the groundbreaking SAM 3, which enables concept segmentation using natural language prompts. The conversation dives into the impressive real-time performance, the massive SACO benchmark of over 200k concepts, and how SAM 3 revolutionizes data annotation—reducing time from two minutes to just 25 seconds. Joseph Nelson from Roboflow shares insights on real-world applications in fields like cancer research and the automation of complex visual reasoning.
AI Snips
Chapters
Transcript
Episode notes
Concept Prompts Replace Manual Clicks
- SAM3 introduces concept prompts to find and segment all instances of a category using short natural-language phrases or visual exemplars.
- This lets users detect and track objects across images and video without clicking every instance manually.
Latency Scales With Objects And GPUs
- SAM3 runs image inferences in ~30ms on an H200 for large object counts and scales to real-time video with multi‑GPU setups.
- Video throughput depends on tracked object count and parallel inference across GPUs.
Benchmarks Expanded To 200k Concepts
- SAM3 required redefining benchmarks from ~1.2k concepts to SACO's 200k+ concepts to match natural language diversity.
- This larger benchmark enabled training and evaluating open‑vocabulary, exhaustive segmentation.

