LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

Shad Theory, Truthful a I, and Owen Cotton Barrett

Shard theory also proposes a sub agent theory of mine. This has some similarities to brain like a g i safety, and has drawn on some research from this post. Opinion, this is promising so far. Deserves a lot more work to be done on it to try to find a reliable way to implant certain inner values into trained systems. I view shard theory as a useful frame for alignment already, even if it doesn't go anywhere else.

Play episode from 01:23:03
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app