128 - Dynamic Benchmarking, with Douwe Kiela

1

Introduction

00:00 • 2min

2

Dynamic Benchmarks and Models in the Loop of Benchmark Creation

02:09 • 2min

3

How to Train Models in Multiple Rounds

04:17 • 2min

4

How to Fool a Model in an NLI Task

05:54 • 2min

5

The Importance of Keeping Models in the Loop

08:12 • 2min

6

The Difficulty of Models in Different Domains

10:39 • 2min

7

The High-Level Trends in the Results of the MLI Paper

12:21 • 2min

8

The Effects of Different Domains on Model Performance

14:38 • 2min

9

The Effect of Prompts on Sentiments

16:51 • 2min

10

How to Use Prompts to Create Entirely New Inputs

18:59 • 3min

11

The Risks and Objections to Asymmetric Data Collection and Dynamic Benchmarking

22:13 • 3min

12

The Importance of Numerical Reasoning in QA Models

24:58 • 2min

13

The Importance of Self-Contained Question Answering

26:58 • 4min

14

Building a Question Answering Model

31:04 • 2min

15

Dainabord: A Model Evaluation Platform

32:38 • 6min

16

How to Integrate Utility and Computing Into a Leaderboard

38:08 • 3min

17

How to Determine the Ratio Between Throughput and Performance on a Leaderboard

40:59 • 3min

18

How to Scale a Dynamic Task Platform

43:46 • 3min