

Behind the scenes of Google's state-of-the-art "nano-banana" image model
Aug 27, 2025
Nicole Brichtova and Mostafa Dehghani from Google's Gemini team dive into the innovative features of their cutting-edge image model, Gemini 2.5 Flash. They discuss how the model enables intricate edits through interleaved generation and its ability to maintain character consistency. Listeners learn about the playful 'nano-banana' concept, showcasing real-time transformations that enhance user engagement. The duo also reflects on the integration of text rendering and user feedback, paving the way for future advancements in image generation technology.
AI Snips
Chapters
Transcript
Episode notes
Big Quality Leap And Creative Interpretation
- Gemini 2.5 Flash shows major quality gains in both image generation and editing, enabling natural multi-turn conversations.
- The model creatively interprets vague prompts while keeping scene coherence and subject identity.
Nano Banana Demo With Logan
- Nicole edited Logan's photo into a 'nano banana' costume using a short, vague prompt.
- The model kept Logan's face recognizably the same while inventing a cohesive new scene.
Text Rendering As A Quality Signal
- Text rendering serves as a reliable proxy metric for overall image structural quality during training.
- Tracking this metric prevents regressions and reveals unexpected beneficial changes.