The Gradient: Perspectives on AI

Sewon Min: The Science of Natural Language

Mar 23, 2023
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Benchmarks Can Mislead Progress

  • Benchmarks often incentivize solving the benchmark rather than the real-world problem they intended to measure.
  • Sewon Min argues we should build benchmarks from real user problems to reveal meaningful challenges.
INSIGHT

Multi-Hop Datasets Hide Shortcuts

  • Many multi-hop QA datasets contain shortcuts so single-hop strategies succeed despite compositional questions.
  • Min shows that dataset construction often lets models avoid true multi-step reasoning.
ADVICE

Design Benchmarks From Real Questions

  • Build benchmarks from real user tasks to surface authentic challenges rather than synthetic ones.
  • Examine real queries carefully to discover hidden difficulties before designing datasets.
Get the Snipd Podcast app to discover more snips from this episode
Get the app