Microsoft Research Podcast cover image

Microsoft Research Podcast

Abstracts: NeurIPS 2024 with Jindong Wang and Steven Euijong Whang

Dec 13, 2024
Jindong Wang, a researcher, and Steven Euijong Whang, an associate professor at KAIST and co-author of the ERBench paper, dive into the innovative ERBench project designed to evaluate large language models (LLMs). They discuss leveraging relational databases to tackle inaccuracies and enhance response assessments. The duo highlights the importance of integrity constraints in crafting multi-hop questions, as well as the varied performance metrics needed to ensure model trustworthiness, especially in addressing LLM hallucinations.
11:34

Podcast summary created with Snipd AI

Quick takeaways

  • The introduction of ERBench highlights the importance of evaluating large language models not only for accuracy but also for understanding the underlying rationale behind their responses.
  • The collaboration between research teams demonstrates how leveraging relational databases can enhance the reliability of LLM assessments through defined integrity constraints and functional dependencies.

Deep dives

Addressing Hallucination in Language Models

Hallucination in large language models (LLMs) refers to their tendency to generate false or non-existent information, undermining reliability. The research introduces ER Bench, a benchmark designed for automatically evaluating hallucinations by utilizing relational databases. These databases not only maintain data integrity through fixed schemas but also allow for better evaluation through functional dependencies, which help identify critical keywords that models should recognize. Thus, the study emphasizes the necessity of not just assessing the accuracy of LLM outputs but also understanding the rationale behind these outputs to enhance trust in their applications.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode