Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Introduction
00:00 • 2min
Dynamic Benchmarks and Models in the Loop of Benchmark Creation
02:09 • 2min
How to Train Models in Multiple Rounds
04:17 • 2min
How to Fool a Model in an NLI Task
05:54 • 2min
The Importance of Keeping Models in the Loop
08:12 • 2min
The Difficulty of Models in Different Domains
10:39 • 2min
The High-Level Trends in the Results of the MLI Paper
12:21 • 2min
The Effects of Different Domains on Model Performance
14:38 • 2min
The Effect of Prompts on Sentiments
16:51 • 2min
How to Use Prompts to Create Entirely New Inputs
18:59 • 3min
The Risks and Objections to Asymmetric Data Collection and Dynamic Benchmarking
22:13 • 3min
The Importance of Numerical Reasoning in QA Models
24:58 • 2min
The Importance of Self-Contained Question Answering
26:58 • 4min
Building a Question Answering Model
31:04 • 2min
Dainabord: A Model Evaluation Platform
32:38 • 6min
How to Integrate Utility and Computing Into a Leaderboard
38:08 • 3min
How to Determine the Ratio Between Throughput and Performance on a Leaderboard
40:59 • 3min
How to Scale a Dynamic Task Platform
43:46 • 3min