Tulsee Doshi, model product lead for Gemini at Google, shares insights on the groundbreaking Gemini 2.0. She discusses the model's significant improvements over its predecessor, including enhanced multimodal capabilities and native tool use, which boost productivity in Google products. Doshi highlights the thrill of launching experimental models while emphasizing the importance of user feedback in refining AI technology. The conversation also unveils innovations like function calling and sophisticated AI agents that lead to richer, personalized user experiences.
Gemini 2.0 enhances user interaction by integrating multimodal capabilities, allowing for seamless task execution and improved performance.
The model's progression reflects a year of significant advancements in tool usage, emphasizing accurate responses and reducing information hallucinations.
Deep dives
Introduction of Gemini 2.0 and Its Capabilities
Gemini 2.0 introduces a range of new capabilities aimed at enhancing user interaction through multimodal agents. This version includes features like screen and spatial understanding, as well as the ability to utilize native search tools. These advancements allow for more seamless integration of tasks, combining reasoning and actions in a way that significantly improves performance over its predecessor. The introduction of 2.0 Flash, in particular, emphasizes practicality and speed, making it suitable for real-time applications and enhancing developer experiences.
Growth and Development of Gemini Over the Past Year
Reflecting on the journey of Gemini since its launch, significant progress has been made within just one year. The initial model faced numerous challenges, but the team has since streamlined processes and built confidence in shipping new versions regularly. The evolution from Gemini 1.0 to 2.0 illustrates enhanced clarity regarding optimal use cases and metrics for success. By incorporating feedback from developers and enterprise customers, the model has matured into a tool that is now integral to Google's product ecosystem.
Native Tool Use and Its Impact on Model Factuality
The introduction of native tool usage within Gemini 2.0 markedly improves the model's ability to provide accurate and contextually relevant responses. By allowing the model to determine when to call external tools like search, it enhances factual reliability while reducing hallucinations. The training focuses not just on function calling but also on how to use these tools effectively within user prompts. This holistic approach results in a significant elevation in both the quality and user experience of the model.
Multimodal Generation and Future Potential
Gemini 2.0 expands beyond text processing to include native image and audio generation, fusing real-world knowledge with creative outputs. For instance, it can accurately place generated objects within images based on contextual cues, such as proper sizing and placement relative to other items. This capability is complemented by the model's ability to generate audio in various styles, enhancing interaction richness. The ongoing development of these features suggests a trajectory towards greater automation and more dynamic agentic behaviors in future applications.
Tulsee Doshi, Gemini model product lead, joins host Logan Kilpatrick to go behind the scenes of Gemini 2.0, taking a deep dive into the model's multimodal capabilities and native tool use, and Google's approach to shipping experimental models.
Watch on YouTube: https://www.youtube.com/watch?v=L7dw799vu5o
Chapters:
Meet Tulsee Doshi
Gemini's Progress Over the Past Year
Introducing Gemini 2.0
Shipping Experimental Models
Gemini 2.0’s Native Tool Use
Function Calling
Multimodal Agents
Rapid Fire Questions
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode