Latent Space: The AI Engineer Podcast

[AIEWF Preview] Gemini in 2025 and Realtime Voice AI

15 snips
Jun 2, 2025
Logan Kilpatrick, a product lead at Google AI Studio, dives into the latest Gemini developments, including implicit context caching and the exciting potential of Gemini Diffusion for generative UIs. Shrestha Basu Mallick, an API product manager, highlights the challenges of live APIs and praises innovations like multilingual TTS and URL Context. Quinn Daily, CEO of Daily, discusses the importance of low-latency audio/video and introduces proactive audio models that filter out irrelevant speech. The trio discusses future capabilities and the need for greater developer control.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
00:00 / 00:00

Controlling Model Reasoning

  • Gemini 2.5 introduces thinking budgets and thought summaries to give developers finer control over reasoning behavior.
  • These features let developers trade reasoning capability for cost and visibility into model thinking.
00:00 / 00:00

Multimodal And Context Retrieval

  • Native audio output and language-switching TTS broaden multimodal app potential and user personalization.
  • URL Context unlocks respectful, deeper retrieval from web pages for research-style agents.
00:00 / 00:00

Personal Demo: Multilingual TTS Joy

  • Shrestha shared personal joy at native audio switching between Bengali and English in demos.
  • She highlighted a demo speaking Klingon to show language flexibility, even if unsupported.
Get the Snipd Podcast app to discover more snips from this episode
Get the app