LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

Learning Abstraction in Neural Networks

The reward circuetary in our brain reliably causes us to care about specific things. Shad theory argues that instead, we should focus on finding outer objectives that reliably give certain inner values into system. The current plan is to figure out how r l induces inner values into learned agents and then figure out how to instil human values into powerful a i models.

Play episode from 01:19:36
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app