The Inside View cover image

5. Charlie Snell on DALL-E and CLIP

The Inside View

00:00

Is There a Transition About Clip?

Clip takes in a bunch of images paired with their text prompts or there's a big data set of images and text prompt pairs. And clip basically learns an encoder to encode each image an image into a vector and the text into a vector. Its goal is to sort of like make it so that the image and the text corresponding to a given image those the vectors sort of a line. You want the text prompts that don't match with an image to like disalign.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app