The Problem of Learning a Classifier

Stuart armstrong: There are many ways for an a i to generalize off distribution. So it is very likely that an arbitrary generalization will be unalligned. Stuart's plan to solve this problem is as follows. One, maintain a set of all possible extrapolations of reward data that are consistent with the training process. Two, pick among these for a safe reward extrapolation.

Play episode from 11:48

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app