AXRP - the AI X-risk Research Podcast

16 - Preparing for Debate AI with Geoffrey Irving

Jul 1, 2022
Ask episode
Chapters
Transcript
Episode notes
1
Introduction
00:00 • 3min
2
Is There a Future for Debate?
03:06 • 2min
3
The Human Interaction in Debate
04:58 • 3min
4
Is There Anything I Should Ask About Debate?
08:12 • 2min
5
The Importance of Language Models for Safety Alignment
10:22 • 2min
6
Language and Human Preferences
12:01 • 3min
7
Is the Solution to Elk Without a Solution to Scalable Alignment?
15:15 • 2min
8
Scaling Scalability to Generate Explanations
17:08 • 2min
9
Using a Linguade Model to Detect a Deficiency Problem
19:33 • 2min
10
Is There a Better Detector Quality for Detecting Failures?
21:37 • 2min
11
Is the Second Approximation a Good Idea?
23:34 • 3min
12
How to Find Failures in a Language Model?
26:16 • 3min
13
Gopher Language Models
29:10 • 2min
14
How to Generate a Red Teaming Model
31:04 • 2min
15
How to Fine Tune R R L Code Base for Language Models
33:32 • 2min
16
Teaching Language Models to Support Answers With Verified Quotes
35:24 • 2min
17
Language Model Interpretability
37:15 • 2min
18
Is There a Language Model in Isolation?
39:06 • 2min
19
Is There a Reward Model for the Answers?
40:51 • 2min
20
I Don't Have the Number for You on Hand.
42:22 • 2min
21
Do You Have Any Lessons About Human Learning?
44:21 • 2min
22
How Much Time Does It Take to Write a RPG?
46:19 • 2min
23
Uncertainly Estimation for Language Reward Models
47:58 • 2min
24
How Hard Is Uncertainty Estimation?
50:24 • 2min
25
How Much Can You Leave Fixed?
52:18 • 4min
26
How Can That Be Right?
56:30 • 3min
27
Disentangle Alutoric and Epistemic Uncertainty?
59:48 • 2min
28
Recruiting for Machine Learning and Cognitive Science
01:01:22 • 3min