
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Shad Theory, Truthful a I, and Owen Cotton Barrett
Shard theory also proposes a sub agent theory of mine. This has some similarities to brain like a g i safety, and has drawn on some research from this post. Opinion, this is promising so far. Deserves a lot more work to be done on it to try to find a reliable way to implant certain inner values into trained systems. I view shard theory as a useful frame for alignment already, even if it doesn't go anywhere else.
Play episode from 01:23:03
Transcript


