
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Alignment Research Center - Eliciting Latent Knowledge
This approach tries to tackle distributional shift, which i might see as one of the fundamental hard parts of alignment. The problem is that i don't see how to integrate this approach for solving this problem with deep learning. It seems like this approach might work well for a model based r l set up where you can make the a i explicitly select for this utility function. There's a footnote here after would generate classifiers that extrapolate in all the different ways,. They just need to span the set of xtrapolations so that the correct extrapolation is just a linear combination of the found classifiers. Back to the main text, they have also introduced a new data set for this
Play episode from 13:13
Transcript


