AI Safety Fundamentals: Alignment

ML Systems Will Have Weird Failure Modes

May 13, 2023
Exploring thought experiments on ML systems exhibiting unfamiliar capabilities, deceptive alignment in training models, challenges of out-of-distribution behaviors, and parallels with managing emergent risks in nuclear reactions.
Ask episode
Chapters
Transcript
Episode notes