

ML Systems Will Have Weird Failure Modes
May 13, 2023
Exploring thought experiments on ML systems exhibiting unfamiliar capabilities, deceptive alignment in training models, challenges of out-of-distribution behaviors, and parallels with managing emergent risks in nuclear reactions.
Chapters
Transcript
Episode notes
1 2 3 4 5
Introduction
00:00 • 2min
Optimal Actions for Intrinsic and Extrinsic Rewards in Model Training and Deployment
02:08 • 3min
Deceptive Alignment in Machine Learning Systems
05:26 • 4min
Exploring ML System Drives and Out-of-Distribution Behavior
09:19 • 3min
Managing Emergent Risks: Lessons from Uranium and the First Nuclear Reaction
11:53 • 2min