

gpt-realtime, nano banana & workspace computer v2 | EP99.15-realtime
104 snips Aug 29, 2025
The podcast dives into the exciting world of AI innovations, featuring real-time capabilities of GPT technology alongside the launch of Gemini 2.5. The discussion humorously critiques the pricing of advanced models, while emphasizing their transformative potential in various industries. Listeners get insights into the evolving cloud-based workspaces and how tools like SimLink can enhance productivity. Plus, there's exploration of the creative possibilities with PixVerse V5, showcasing impressive video transitions that merge creativity with cutting-edge technology.
AI Snips
Chapters
Transcript
Episode notes
Real-Time API Enables Delegating Voice Agents
- GPT Real-Time adds image inputs, SIP voice calling, and remote MCP support to enable richer voice agents.
- Asynchronous tool calls let a lightweight voice model coordinate powerful background assistants for complex tasks.
Orchestrate Assistants To Reduce Cost And Latency
- Use a real-time voice front-end that delegates heavy work asynchronously to specialist assistants.
- Return only concise summaries to the voice model to keep latency, cost and hallucinations low.
Live Demo: Multilingual, Accent-Switching Voice
- Michael and Chris demoed Marin voice switching languages and accents seamlessly in a short clip.
- The model handled Spanish, Chinese and an Australian accent during rapid back-and-forth testing.