The Inside View cover image

5. Charlie Snell on DALL-E and CLIP

The Inside View

CHAPTER

Decoder

The decoder is effectively a one to one map of code to image. On a very technical level it does output a distribution of images but again almost always people just take the argmax so it can be seen as just you know one code to one image yeah. Yeah once you have this discrete sequence you can then basically input it into a language model and the language model just embeds the discrete sequence like any like it does with you know language tokens.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner