
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Infrabasian Physicalism, I B P
I b allows agents to have abstract models of themselves, and thus works in an embedded setting. Venassa uss this formalism to describe pre d c a, an alignment proposal based on i b p. Venessa models an a i as a model based r l system with the w m a reward function and a policy derived from the w m plus reward. She claims that this avoids the sharp leftturn the generalization problems come from the world model. It is open to show that this proposal also solves inner alignment, but there is some chance that it does.
Play episode from 01:00:22
Transcript


