Decoder

The decoder is effectively a one to one map of code to image. On a very technical level it does output a distribution of images but again almost always people just take the argmax so it can be seen as just you know one code to one image yeah. Yeah once you have this discrete sequence you can then basically input it into a language model and the language model just embeds the discrete sequence like any like it does with you know language tokens.

Play episode from 27:31

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app