
5. Charlie Snell on DALL-E and CLIP
The Inside View
Decoder
The decoder is effectively a one to one map of code to image. On a very technical level it does output a distribution of images but again almost always people just take the argmax so it can be seen as just you know one code to one image yeah. Yeah once you have this discrete sequence you can then basically input it into a language model and the language model just embeds the discrete sequence like any like it does with you know language tokens.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.