The Gradient: Perspectives on AI

chevron_right

Sewon Min: The Science of Natural Language

Mar 23, 2023

01:42:44

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

insights

INSIGHT

Benchmarks Can Mislead Progress

Benchmarks often incentivize solving the benchmark rather than the real-world problem they intended to measure.
Sewon Min argues we should build benchmarks from real user problems to reveal meaningful challenges.

insights

INSIGHT

Multi-Hop Datasets Hide Shortcuts

Many multi-hop QA datasets contain shortcuts so single-hop strategies succeed despite compositional questions.
Min shows that dataset construction often lets models avoid true multi-step reasoning.

volunteer_activism

ADVICE

Design Benchmarks From Real Questions

Build benchmarks from real user tasks to surface authentic challenges rather than synthetic ones.
Examine real queries carefully to discover hidden difficulties before designing datasets.

Get the Snipd Podcast app to discover more snips from this episode

How My Interest in Question Answering Evolved From There

04:16 • 2min

chevron_right

The History of Benchmarks in AI

06:18 • 3min

chevron_right

How to Ablate and Verify Complex Reasoning for a Multi-Hop QA Benchmark

09:23 • 3min

chevron_right

How to Decompose a Multi-Hop Question Answering Benchmark

11:57 • 4min

chevron_right

How to Construct a Good Benchmark for QA

16:12 • 2min

chevron_right

Ambig QA: A Benchmark for Ambig Questions

18:36 • 4min

chevron_right

The Different Types of Ambiguity in Grace Anatomy

22:08 • 2min

chevron_right

How to Solve the Ambiguity Problem

24:22 • 1min

chevron_right

How to Construct a Large Scale Fact Verification Data Set With Real World Claims

25:52 • 2min

chevron_right

The Problem Beyond Fact-Checking

28:03 • 2min

chevron_right

How to Create a Data Set to Challenge False Claims

29:54 • 2min

chevron_right

The Importance of Intuition in Machine Learning

31:41 • 3min

chevron_right

The Future of Deep Learning Models

34:13 • 2min

chevron_right

What Is in Context Learning?

36:38 • 2min

chevron_right

The Role of Demonstrations in Context Learning

39:02 • 4min

chevron_right

The Importance of Input Distribution in in-Context Learning

42:49 • 2min

chevron_right

The Context of Learning: A Meta Optimization Process

45:02 • 2min

chevron_right

How to Improve Efficiency for Learning in Context

47:00 • 3min

chevron_right

The Importance of Task Transfer in Meta-Isl

49:43 • 2min

chevron_right

The Role of Compositionality in Meta ICL

51:54 • 2min

chevron_right

The Effect of Meta-Training Tasks on Task Transfer

54:07 • 3min

chevron_right

The Meta-Training of Language Models

56:49 • 3min

chevron_right

The Role of Meta ICL in Bayesian Inference

59:33 • 3min

chevron_right

The Copying Effect: How Closeness Affects Model Prediction

01:02:10 • 3min

chevron_right

The Evolution of Intuition in Context Learning

01:04:54 • 3min

chevron_right

Chain of Thought Prompting

01:07:38 • 2min

chevron_right

How Slash Why Does It Work?

01:09:30 • 2min

chevron_right

The Problem With Parametric Models in Language Models

01:11:24 • 4min

chevron_right

Dense Passage Retrieval for Open Domain Question Answering

01:15:03 • 3min

chevron_right

The Importance of Similarity in Question-Entry

01:18:14 • 2min

chevron_right

How to Improve a Retrieval Method

01:19:57 • 2min

chevron_right

Non-Parametric Mass Language Modeling

01:21:35 • 3min

chevron_right

How to Train a Language Model Like This

01:24:38 • 2min

chevron_right

The Importance of Non-Parametric Modeling in Language Models

01:26:50 • 2min

chevron_right

The Importance of Verification and Attribution in Language Models

01:28:54 • 2min

chevron_right

The Difficulty of Doing a PhD in Machine Learning

01:30:26 • 3min

chevron_right

The Hardest Part of Doing a PhD

01:33:56 • 3min

chevron_right

Advice for a PhD Candidate

01:36:33 • 2min

chevron_right

The Importance of Impactful Metrics in Academic Work

01:38:04 • 2min

chevron_right

How to Choose the Right Advisor for Your PhD Program

01:40:17 • 2min

chevron_right

In episode 65 of The Gradient Podcast, Daniel Bashir speaks to Sewon Min.

Sewon is a fifth-year PhD student in the NLP group at the University of Washington, advised by Hannaneh Hajishirzi and Luke Zettlemoyer. She is a part-time visiting researcher at Meta AI and a recipient of the JP Morgan PhD Fellowship. She has previously spent time at Google Research and Salesforce research.

Have suggestions for future podcast guests (or other feedback)? Let us know here!

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter

Outline:

* (00:00) Intro

* (03:00) Origin Story

* (04:20) Evolution of Sewon’s interests, question-answering and practical NLP

* (07:00) Methodology concerns about benchmarks

* (07:30) Multi-hop reading comprehension

* (09:30) Do multi-hop QA benchmarks actually measure multi-hop reasoning?

* (12:00) How models can “cheat” multi-hop benchmarks

* (13:15) Explicit compositionality

* (16:05) Commonsense reasoning and background information

* (17:30) On constructing good benchmarks

* (18:40) AmbigQA and ambiguity

* (22:20) Types of ambiguity

* (24:20) Practical possibilities for models that can handle ambiguity

* (25:45) FaVIQ and fact-checking benchmarks

* (28:45) External knowledge

* (29:45) Fact verification and “complete understanding of evidence”

* (31:30) Do models do what we expect/intuit in reading comprehension?

* (34:40) Applications for fact-checking systems

* (36:40) Intro to in-context learning (ICL)

* (38:55) Example of an ICL demonstration

* (40:45) Rethinking the Role of Demonstrations and what matters for successful ICL

* (43:00) Evidence for a Bayesian inference perspective on ICL

* (45:00) ICL + gradient descent and what it means to “learn”

* (47:00) MetaICL and efficient ICL

* (49:30) Distance between tasks and MetaICL task transfer

* (53:00) Compositional tasks for language models, compositional generalization

* (55:00) The number and diversity of meta-training tasks

* (58:30) MetaICL and Bayesian inference

* (1:00:30) Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

* (1:02:00) The copying effect

* (1:03:30) Copying effect for non-identical examples

* (1:06:00) More thoughts on ICL

* (1:08:00) Understanding Chain-of-Thought Prompting

* (1:11:30) Bayes strikes again

* (1:12:30) Intro to Sewon’s text retrieval research

* (1:15:30) Dense Passage Retrieval (DPR)

* (1:18:40) Similarity in QA and retrieval

* (1:20:00) Improvements for DPR

* (1:21:50) Nonparametric Masked Language Modeling (NPM)

* (1:24:30) Difficulties in training NPM and solutions

* (1:26:45) Follow-on work

* (1:29:00) Important fundamental limitations of language models

* (1:31:30) Sewon’s experience doing a PhD

* (1:34:00) Research challenges suited for academics

* (1:35:00) Joys and difficulties of the PhD

* (1:36:30) Sewon’s advice for aspiring PhDs

* (1:38:30) Incentives in academia, production of knowledge

* (1:41:50) Outro

Links:

* Sewon’s homepage and Twitter

* Papers

* Solving and re-thinking benchmarks

* Multi-hop Reading Comprehension through Question Decomposition and Rescoring / Compositional Questions Do Not Necessitate Multi-hop Reasoning

* AmbigQA: Answering Ambiguous Open-domain Questions

* FaVIQ: FAct Verification from Information-seeking Questions

* Language Modeling

* Rethinking the Role of Demonstrations

* MetaICL: Learning to Learn In Context

* Towards Understanding CoT Prompting

* Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

* Text representation/retrieval

* Dense Passage Retrieval

* Nonparametric Masked Language Modeling

Get full access to The Gradient at thegradientpub.substack.com/subscribe

Home Top podcasts Popular guests Top books