Fresh off impressive releases at Google’s I/O event, three Google Labs leaders explain how they’re reimagining creative tools and productivity workflows. Thomas Iljic details how video generation is merging filmmaking with gaming through generative AI cameras and world-building interfaces in Whisk and Veo. Jaclyn Konzelmann demonstrates how Project Mariner evolved from a disruptive browser takeover to an intelligent background assistant that remembers context across multiple tasks. Simon Tokumine reveals NotebookLM’s expansion beyond viral audio overviews into a comprehensive platform for transforming information into personalized formats. The conversation explores the shift from prompting to showing and telling, the economics of AI-powered e-commerce, and why being “too early” has become Google Labs’ biggest challenge and advantage.
Hosted by Sonya Huang, Sequoia Capital
00:00 Introduction
02:12 Google's AI models and public perception
04:18 Google's history in image and video generation
06:45 Where Whisk and Flow fit
10:30 How close are we to having the ideal tool for the craft?
13:05 Where do the movie and game worlds start to merge?
16:25 Introduction to Project Mariner
17:15 How Mariner works
22:34 Mariner user behaviors
27:07 Temporary tattoos and URL memory
27:53 Project Mariner's future
29:26 Agent capabilities and use cases
31:09 E-commerce and agent interaction
35:03 Notebook LM evolution
48:26 Predictions and future of AI
Mentioned in this episode:
-
Whisk: Image and video generation app for consumers
-
Flow: AI-powered filmmaking with new Veo 3 model
-
Project Mariner: research prototype exploring the future of human-agent interaction, starting with browsers
-
NotebookLM: tool for understanding and engaging with complex information including Audio Overviews and now a mobile app
-
Shop with AI Mode: Shopping app with a virtual try-on tool based on your own photos
-
Stitch: New prompt-based interface to design UI for mobile and web applications.
ControlNet paper: Outlined an architecture for adding conditional language to direct the outputs of image generation with diffusion models