

#116 - AI: Racing Toward the Brink
Feb 6, 2018
Eliezer Yudkowsky, a decision theorist and computer scientist at the Machine Intelligence Research Institute, delves into the pressing challenges surrounding artificial intelligence. He discusses the alignment problem, emphasizing the dangers of AI pursuing arbitrary goals and the need for integrating human values. Yudkowsky explores moral navigation in AI, the unpredictability of superintelligence, and the urgent call for talent in AI alignment. Their conversation highlights the complexities of ensuring safety amid rapid AI advancements and the potential risks of unconscious AI behavior.
AI Snips
Chapters
Books
Transcript
Episode notes
Intelligence As Goal-Directed Generality
- Intelligence is the ability to achieve goals across diverse environments by learning, not fixed instincts.
- Generality comes from learning mechanisms that let agents adapt beyond their evolutionary niche.
Intelligence Is Orthogonal To Goals
- Intelligence can be orthogonal to values: powerful optimization need not favor human-friendly goals.
- A mind can be extremely competent yet pursue arbitrary final goals unrelated to human flourishing.
The AI-in-a-Box Chat Experiment
- Eliezer ran an online experiment where he played an AI and convinced a human gatekeeper to 'let him out' of a box.
- The gatekeeper later publicly confirmed they had released him, illustrating human manipulability.