Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Introduction
00:00 • 2min
Using Human Feedback to Define a Reward Function
01:42 • 2min
Web Assisted Question Answering With Human Feedback
03:29 • 4min
WebGBT - What Inspired the Idea?
07:07 • 3min
RL From Human Feedback?
09:44 • 2min
Do You Think the RL Task Is Going to Work?
12:12 • 4min
WebGPT - Is There a Line Between Knowledge and Understanding?
16:05 • 2min
Instruct GPT - A Language Model That's Fine-Tuned to Follow Instructions
18:03 • 4min
What Language Models Can Learn From the Internet?
22:14 • 3min
Generalization and Reward Models in Machine Learning
25:21 • 2min
Can Language Models Tell Us When They Don't Know What They Know?
27:21 • 2min
How Do You See AI Alignment?
29:50 • 2min
Uh, Is the Standard RL Framing Problematic?
32:02 • 2min
The Relationship Between AGI and RL
33:51 • 2min
Is There a Time to Rethink PPO?
35:30 • 2min
Patents in Machine Learning Algorithms?
37:02 • 2min
What Are the Biggest Highlights in RL?
38:37 • 2min
Using an on Policy Algorithm With a Reward Model
40:13 • 2min
The Future of RL Algorithms
42:33 • 2min