The examples in the live launch stream that they did of understanding the images definitely blew my mind. They have some really interesting techniques which I imagine, you know, open eyes doing some similar stuff. Their approach is involves training a connector model to essentially translate an image in coding to the latent space of the text model. It's actually predicting the embeddings and kind of injecting them directly into context.
This is a special preview episode of The Cognitive Revolution: How AI Changes Everything. Hosted by Erik Torenberg and Nathan Labenz, TCR hosts in-depth interviews with the creators, builders and thinkers pushing the bleeding edge of AI. On this episode, they talk with Riley Goodside, the first Staff Prompt Engineer at Scale AI and expert in prompting LLMs and integrating them into AI applications.
Check out The Cognitive Revolution The perfect AI interview complement to The AI Breakdown https://link.chtbl.com/TheCognitiveRevolution Find TCR on YouTube: https://www.youtube.com/@CognitiveRevolutionPodcast