LessWrong (Curated & Popular) cover image

“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena

LessWrong (Curated & Popular)

00:00

Exploring Self-Other Overlap for AI Alignment

This chapter explores the technique of self-other overlap to enhance AI alignment by aligning a model's reasoning about itself and others. It discusses preliminary experimental findings, the benefits of scalability and low interpretability, and the challenges of maintaining distinctions between self and others.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app