"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

Interfacing with AI, with Linus Lee of Notion

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

NOTE

Manipulating and Combining Text and Images in Embedding Spaces

The speaker's demos involve using an image model called CLIP to combine images and text by adding vectors in the latent space of the model to generate new outputs like adding emotions to human faces or styles to photographs. Similarly, text can be embedded into a latent space, manipulated, and then converted back to text with minimal meaning loss. By manipulating the text embeddings, the speaker can semantically combine sentences, interpolate between sentences with different tones, and generate sentences that lie between the original texts. This manipulation allows for non-verbal ways of controlling or editing text and images, providing new avenues for controlling outputs using abstract concepts. The speaker has also been working on more rigorous versions of these techniques for better precision and has developed creative front-end experiences to showcase these capabilities.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner