

Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688
Jun 10, 2024
Fatih Porikli, Senior Director of Technology at Qualcomm AI Research, dives into groundbreaking advancements in generative AI and computer vision. He discusses efficient diffusion models for text-to-image generation and real-time 360° image relighting. The conversation also highlights innovative applications like a video-language model for personalized fitness coaching and a Math Search dataset for visual reasoning. Porikli touches on practical demos at CVPR, showcasing multi-modal models and enhancing AI's capabilities for mobile and edge devices.
AI Snips
Chapters
Transcript
Episode notes
Clockwork UNETs Insight
- The middle layers of a UNET in text-to-image generation can be approximated for efficiency.
- This is because small perturbations in these layers affect composition, not fine textures.
Actionable Feedback
- Standard LLMs provide basic feedback on actions like squats (e.g., "Successfully completed").
- The "Look, Remember, and Reason" model offers specific, actionable feedback (e.g., "Smooth on the way down").
360° Image Generation Insight
- Text-conditioned 360° HDR images can enhance video portrait relighting.
- This allows dynamic lighting changes by rotating a generated environment around the subject.