
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
LessWrong (Curated & Popular)
00:00
Learning Abstraction in Neural Networks
The reward circuetary in our brain reliably causes us to care about specific things. Shad theory argues that instead, we should focus on finding outer objectives that reliably give certain inner values into system. The current plan is to figure out how r l induces inner values into learned agents and then figure out how to instil human values into powerful a i models.
Play episode from 01:19:36
Transcript


