2min chapter

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0 cover image

Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

CHAPTER

The Importance of Image Encoding in Modeling

The model was trained using 11 million images, 1.1 billion masks and an image encoder on all of those images. Then the much lighter parts become, okay, so if I've got that image encoding, I need to interact and understand what's inside the image encoding. And that's where the mask decoder comes into play in the model architecture. What's really cool is there's both prompts for saying like the thing that you're interested in. But then there's also, you can just point and click and say this is the part of the image I'm interested in. Which is exactly what like a labeling interface would be useful for as an example.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode