Victoria Krakovna is a Research Scientist at DeepMind working on AGI safety and a co-founder of the Future of Life Institute, a non-profit organization working to mitigate technological risks to humanity and increase the chances of a positive future. In this interview we discuss three of her recent LW posts, namely DeepMind Alignment Team Opinions On AGI Ruin Arguments, Refining The Sharp Left Turn Threat Model and Paradigms of AI Alignment.
Transcript: theinsideview.ai/victoria
Youtube: https://youtu.be/ZpwSNiLV-nw
OUTLINE
(00:00) Intro
(00:48) DeepMind Alignment Team Opinions On AGI Ruin Arguments
(05:13) On The Possibility Of Iterating On Dangerous Domains and Pivotal acts
(14:14) Alignment and Interpretability
(18:14) Deciding Not To Build AGI And Stricted Publication norms
(27:18) Specification Gaming And Goal Misgeneralization
(33:02) Alignment Optimism And Probability Of Dying Before 2100 From unaligned AI
(37:52) Refining The Sharp Left Turn Threat Model
(48:15) A 'Move 37' Might Disempower Humanity
(59:59) Finding An Aligned Model Before A Sharp Left Turn
(01:13:33) Detecting Situational Awarareness
(01:19:40) How This Could Fail, Deception After One SGD Step
(01:25:09) Paradigms of AI Alignment
(01:38:04) Language Models Simulating Agency And Goals
(01:45:40) Twitter Questions
(01:48:30) Last Message For The ML Community