Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Introduction
00:00 • 3min
Is There a Future for Debate?
03:06 • 2min
The Human Interaction in Debate
04:58 • 3min
Is There Anything I Should Ask About Debate?
08:12 • 2min
The Importance of Language Models for Safety Alignment
10:22 • 2min
Language and Human Preferences
12:01 • 3min
Is the Solution to Elk Without a Solution to Scalable Alignment?
15:15 • 2min
Scaling Scalability to Generate Explanations
17:08 • 2min
Using a Linguade Model to Detect a Deficiency Problem
19:33 • 2min
Is There a Better Detector Quality for Detecting Failures?
21:37 • 2min
Is the Second Approximation a Good Idea?
23:34 • 3min
How to Find Failures in a Language Model?
26:16 • 3min
Gopher Language Models
29:10 • 2min
How to Generate a Red Teaming Model
31:04 • 2min
How to Fine Tune R R L Code Base for Language Models
33:32 • 2min
Teaching Language Models to Support Answers With Verified Quotes
35:24 • 2min
Language Model Interpretability
37:15 • 2min
Is There a Language Model in Isolation?
39:06 • 2min
Is There a Reward Model for the Answers?
40:51 • 2min
I Don't Have the Number for You on Hand.
42:22 • 2min
Do You Have Any Lessons About Human Learning?
44:21 • 2min
How Much Time Does It Take to Write a RPG?
46:19 • 2min
Uncertainly Estimation for Language Reward Models
47:58 • 2min
How Hard Is Uncertainty Estimation?
50:24 • 2min
How Much Can You Leave Fixed?
52:18 • 4min
How Can That Be Right?
56:30 • 3min
Disentangle Alutoric and Epistemic Uncertainty?
59:48 • 2min
Recruiting for Machine Learning and Cognitive Science
01:01:22 • 3min