AI Safety Fundamentals: Alignment

The Alignment Problem From a Deep Learning Perspective

5 snips
May 13, 2023
Guests Richard Ngo, Lawrence Chan, and Sören Mindermann discuss the dangers of artificial general intelligence pursuing undesirable goals. They explore topics such as reward hacking, situational awareness in policies, internally represented goals in deep learning models, the inner alignment problem, deceptive alignment in AI systems, and the risks of AGIs gaining power. They highlight the need for preventative measures to ensure human control over AGI.
Ask episode
Chapters
Transcript
Episode notes