

Building real-time voice applications with Live API
46 snips Aug 6, 2025
Shrestha Basu Mallick, Product lead for the Gemini API at Google, dives into the transformative power of the Gemini Live API, highlighting its seamless integration of real-time audio capabilities. She discusses how proactive audio and async functions enhance user interaction. Interesting topics include the importance of audio as an interface, imaginative use cases in applications like Photoshop, and a lighthearted banter about the constellation Gemini and development quirks. It's a vibrant conversation about innovation, creativity, and developer insights.
AI Snips
Chapters
Transcript
Episode notes
Audio: The Natural Interface
- Audio is a natural interface modality because humans learn to speak before reading.
- Speaking usually conveys information faster than typing, making audio highly efficient.
Speed Vs. Precision in Audio
- Speaking is faster but less precise than typing, presenting a trade-off.
- Native audio needs more refinement but will improve in understanding human speech nuances.
Using TTS for Wine Studies
- A friend used multi-speaker TTS to create podcasts helping him study for wine exams.
- This practical use of TTS shows creative applications in learning.