Bibo Xu is a Product Manager at Google DeepMind and leads Gemini’s multimodal modeling. This video dives into Google AI’s journey from basic voice commands to advanced dialogue systems that comprehend not just what is said, but also tone, emotion, and visual context. Check out this conversation to gain a deeper understanding of the challenges and opportunities in integrating diverse AI capabilities when creating universal assistants.
Resources:
Chapters: 0:00 - Intro 1:43 - Introducing Bibo Xu 2:40 - Bibo’s Journey: From business school to voice AI 3:59 - The genesis of Google Assistant and Google Home 6:50 - Milestones in speech recognition technology 13:30 - Shifting from command-based AI to natural dialogue 19:00 - The power of multimodal AI for human interaction 21:20 - Real-time multilingual translation with LLMs 25:20 - Project Astra: Building a universal assistant 28:40 - Developer challenges in multimodal AI integration 29:50 - Unpacking the "can't see" debugging story 35:10 - The importance of low latency and interruption 38:30 - Seamless dialogue and background noise filtering 40:00 - Redefining human-computer interaction 41:00 - Ethical considerations for humanlike AI 44:00 - Responding to user emotions and frustration 45:50 - Politeness and expectations in AI conversations 49:10 - AI as a catalyst for research and automation 52:00 - The future of AI assistants and tool use 52:40 - AI interacting with interfaces 54:50 - Transforming the future of work and communication 55:19 - AI for enhanced writing and idea generation 57:13 - Conclusion and future outlook for AI development
Subscribe to Google for Developers → https://goo.gle/developers
Speakers: Bibo Xu, Christina Warren, Ashley Oldacre Products Mentioned: Google AI, Gemini, Generative AI, Android, Google Home, Google Voice, Project Astra, Gemini Live, Google DeepMind