Manipulating and Combining Text and Images in Embedding Spaces | 1min snip from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Interfacing with AI, with Linus Lee of Notion

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

NOTE

Manipulating and Combining Text and Images in Embedding Spaces

The speaker's demos involve using an image model called CLIP to combine images and text by adding vectors in the latent space of the model to generate new outputs like adding emotions to human faces or styles to photographs. Similarly, text can be embedded into a latent space, manipulated, and then converted back to text with minimal meaning loss. By manipulating the text embeddings, the speaker can semantically combine sentences, interpolate between sentences with different tones, and generate sentences that lie between the original texts. This manipulation allows for non-verbal ways of controlling or editing text and images, providing new avenues for controlling outputs using abstract concepts. The speaker has also been working on more rigorous versions of these techniques for better precision and has developed creative front-end experiences to showcase these capabilities.

00:00

Transcript

Play full episode

Transcript

Episode notes

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.