How to Control the Objectives of an AI System

For modern ML systems, we don't get to explicitly state a systems' objectives. Instead, we reward or punish a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralization. We'll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app