The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688

Jun 10, 2024
Fatih Porikli, Senior Director of Technology at Qualcomm AI Research, dives into groundbreaking advancements in generative AI and computer vision. He discusses efficient diffusion models for text-to-image generation and real-time 360° image relighting. The conversation also highlights innovative applications like a video-language model for personalized fitness coaching and a Math Search dataset for visual reasoning. Porikli touches on practical demos at CVPR, showcasing multi-modal models and enhancing AI's capabilities for mobile and edge devices.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Clockwork UNETs Insight

  • The middle layers of a UNET in text-to-image generation can be approximated for efficiency.
  • This is because small perturbations in these layers affect composition, not fine textures.
ANECDOTE

Actionable Feedback

  • Standard LLMs provide basic feedback on actions like squats (e.g., "Successfully completed").
  • The "Look, Remember, and Reason" model offers specific, actionable feedback (e.g., "Smooth on the way down").
INSIGHT

360° Image Generation Insight

  • Text-conditioned 360° HDR images can enhance video portrait relighting.
  • This allows dynamic lighting changes by rotating a generated environment around the subject.
Get the Snipd Podcast app to discover more snips from this episode
Get the app