
Bonus: Preventing an AI-Related Catastrophe
Hear This Idea
00:00
How to Control the Objectives of an AI System
For modern ML systems, we don't get to explicitly state a systems' objectives. Instead, we reward or punish a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralization. We'll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.
Transcript
Play full episode