Machine Learning Street Talk (MLST)

#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks

Apr 7, 2022
Yasaman Razeghi, a PhD student at UC Irvine, discusses her groundbreaking research showing that large language models excel at reasoning tasks primarily due to dataset memorization. Prof. Sameer Singh, an expert in machine learning interpretability, shares insights on the perils of metric obsession in evaluating AI. They delve into the importance of understanding human-like reasoning in AI and advocate for nuanced metrics that truly assess model capabilities. Their engaging conversation shines a light on the future of model testing and explainability.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Memorization, Not Reasoning

  • Large language models (LLMs) show strong performance on reasoning tasks, especially numerical reasoning like multiplication.
  • This performance correlates with the term frequency in pre-training data, suggesting memorization, not robust reasoning.
INSIGHT

Right for the Wrong Reasons

  • Machine learning often prioritizes behavioral outputs and metrics over understanding how models arrive at answers.
  • Sometimes, models achieve correct results for the wrong reasons.
INSIGHT

OpenAI's Secrecy

  • OpenAI's guardedness about their training data raises concerns.
  • They might want to promote the belief that their models are doing more than memorizing data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app