

Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)
Jun 19, 2024
PhD AI researcher Stephen Casper discusses interpretability, robustness, and limitations of AI models. They explore AI safety, risks, research gaps, and advice for early career researchers. Casper emphasizes finding flaws in neural nets, AGI risks, and practical AI applications.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Intro
00:00 • 3min
Exploring AI Safety and Risks in the Field
02:43 • 17min
Assessing Catastrophic Risks in Artificial Intelligence
20:06 • 11min
Navigating Risks in Advanced AI Systems
30:40 • 30min
Interpretability in AI with Sparse Autoencoders
01:00:48 • 5min
Challenges and Limitations in Interpretability Research
01:06:12 • 29min
Closing the Gap: Addressing Real-world Problems in AI Research
01:34:53 • 5min
Exploring Practical Usefulness and Progress in Research Fields
01:39:58 • 2min
Enhancing AI Safety through Latent Adversarial Training
01:41:43 • 13min
Rethinking AI Training: Late Adversarial Training
01:54:27 • 13min
The Significance of Adversarial Training and Scoping in AI Models
02:07:19 • 18min
Exploring Key Gaps in AI Safety Field and Advice for New Entrants
02:25:20 • 4min
Importance of Reading and Staying Updated in AI Research
02:29:05 • 9min
Exploring the Role of Non-Industry Professionals in Driving Socio-Technical Change
02:38:00 • 2min
Exploring Various Topics in AI Research and Knowledge Exchange
02:40:04 • 2min