Munawar Hayat

Researcher at Qualcomm AI Research specializing in multimodal generative AI, vision-language models, and efficient on-device AI; author of multiple NeurIPS papers discussed in the episode.

Best podcasts with Munawar Hayat

Ranked by the Snipd community

79 snips

Dec 9, 2025 • 58min

Why Vision Language Models Ignore What They See with Munawar Hayat - #758

Munawar Hayat, a researcher at Qualcomm AI Research specializing in multimodal generative AI, dives into the intricacies of Vision-Language Models (VLMs). He discusses the puzzling issue of object hallucination, revealing why these models often overlook visual elements in favor of language. Munawar also introduces attention-guided alignment techniques and a novel approach to generalized contrastive learning for efficient multi-modal retrieval. He shares insights on the Multi-Human Testbench designed to tackle identity leakage challenges in generative models, bringing clarity to this evolving field.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app