

Dan Hendrycks on Catastrophic AI Risks
28 snips Nov 3, 2023
Dan Hendrycks, AI risk expert, discusses X.ai, evolving AI risk thinking, malicious use of AI, AI race dynamics, making AI organizations safer, and representation engineering for understanding AI traits like deception.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11
Introduction
00:00 • 2min
Categorization of Catastrophic Risks from AI
02:15 • 10min
Analyzing the Risks of Malicious AI and Bioengineered Viruses
11:56 • 22min
Risks of a Military AI Race
34:11 • 30min
Incentivizing Safety Research and Diversifying Portfolio
01:04:16 • 6min
Internal decision-making processes and the Swiss cheese model of organizational safety
01:10:15 • 5min
Updating the AI Safety Textbook
01:15:36 • 7min
The Risks of Rogue AI and Proxy Gaming
01:22:57 • 5min
Addressing Adversarial Optimization Pressure
01:28:10 • 25min
Deceptive Behavior in Reinforcement Learning
01:53:37 • 4min
Representation Engineering versus Mechanistic Interpretability
01:57:54 • 10min