#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks

Apr 7, 2022

Yasaman Razeghi, a PhD student at UC Irvine, discusses her groundbreaking research showing that large language models excel at reasoning tasks primarily due to dataset memorization. Prof. Sameer Singh, an expert in machine learning interpretability, shares insights on the perils of metric obsession in evaluating AI. They delve into the importance of understanding human-like reasoning in AI and advocate for nuanced metrics that truly assess model capabilities. Their engaging conversation shines a light on the future of model testing and explainability.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Memorization, Not Reasoning

Large language models (LLMs) show strong performance on reasoning tasks, especially numerical reasoning like multiplication.
This performance correlates with the term frequency in pre-training data, suggesting memorization, not robust reasoning.

INSIGHT

Right for the Wrong Reasons

Machine learning often prioritizes behavioral outputs and metrics over understanding how models arrive at answers.
Sometimes, models achieve correct results for the wrong reasons.

INSIGHT

OpenAI's Secrecy

OpenAI's guardedness about their training data raises concerns.
They might want to promote the belief that their models are doing more than memorizing data.

Get the Snipd Podcast app to discover more snips from this episode

Get the app