
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
The Problem of Learning a Classifier
Stuart armstrong: There are many ways for an a i to generalize off distribution. So it is very likely that an arbitrary generalization will be unalligned. Stuart's plan to solve this problem is as follows. One, maintain a set of all possible extrapolations of reward data that are consistent with the training process. Two, pick among these for a safe reward extrapolation.
Play episode from 11:48
Transcript


