Get the app
Munawar Hayat
Researcher at Qualcomm AI Research specializing in multimodal generative AI, vision-language models, and efficient on-device AI; author of multiple NeurIPS papers discussed in the episode.
Best podcasts with Munawar Hayat
Ranked by the Snipd community
67 snips
Dec 9, 2025
• 58min
Why Vision Language Models Ignore What They See with Munawar Hayat - #758
chevron_right
Munawar Hayat, a researcher at Qualcomm AI Research specializing in multimodal generative AI, dives into the intricacies of Vision-Language Models (VLMs). He discusses the puzzling issue of object hallucination, revealing why these models often overlook visual elements in favor of language. Munawar also introduces attention-guided alignment techniques and a novel approach to generalized contrastive learning for efficient multi-modal retrieval. He shares insights on the Multi-Human Testbench designed to tackle identity leakage challenges in generative models, bringing clarity to this evolving field.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
Get the app