LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

The Problem of Learning a Classifier

Stuart armstrong: There are many ways for an a i to generalize off distribution. So it is very likely that an arbitrary generalization will be unalligned. Stuart's plan to solve this problem is as follows. One, maintain a set of all possible extrapolations of reward data that are consistent with the training process. Two, pick among these for a safe reward extrapolation.

Play episode from 11:48
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app