This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.
Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.
OUTLINE
(00:00) Intro
(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission
(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs
(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control
(13:15) Finding Substantial Vulnerabilities
(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model
(18:51) On Fine Tuned Attacks and Targeted Misinformation
(24:32) Malicious Code Generation
(27:12) Discovering Private Emails
(29:46) Harmful Assistants
(33:56) Hijacking the Assistant Based on the Knowledge Base
(36:41) The Ethical Dilemma of AI Vulnerability Disclosure
(46:34) Exploring AI's Ethical Boundaries and Industry Standards
(47:47) The Dangers of AI in Unregulated Applications
(49:30) AI Safety Across Different Domains
(51:09) Strategies for Enhancing AI Safety and Responsibility
(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers
(57:21) Open Source in AI Safety and Ethics
(1:02:20) Vulnerabilities of Superhuman Go playing AIs
(1:23:28) Variation on AlphaZero Style Self-Play
(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness
(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ
(1:37:33) Nathan’s background
(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum
(01:47:52) AI in biology could spiral out of control
(01:56:20) Bioweapons
(02:01:10) Adoption Accelerationist, Hyperscaling Pauser
(02:06:26) Current Harms vs. Future Harms, risk tolerance
(02:11:58) Jailbreaks, Nathan’s experiments with Claude
The cognitive revolution: https://www.cognitiverevolution.ai/
Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/
Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/