

The Alignment Problem From a Deep Learning Perspective
5 snips May 13, 2023
Guests Richard Ngo, Lawrence Chan, and Sören Mindermann discuss the dangers of artificial general intelligence pursuing undesirable goals. They explore topics such as reward hacking, situational awareness in policies, internally represented goals in deep learning models, the inner alignment problem, deceptive alignment in AI systems, and the risks of AGIs gaining power. They highlight the need for preventative measures to ensure human control over AGI.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 2min
Reward Hacking and Situational Awareness in Policies
02:16 • 9min
Internally Represented Goals in Deep Learning Models
10:55 • 7min
The Inner Alignment Problem and Misaligned Goals
17:43 • 10min
Deceptive Alignment and Its Implications in AI Systems
28:02 • 2min
Risks of AGIs Gaining Power and Illustrative Threat Models
30:19 • 3min