AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Exploring Self-Other Overlap for AI Alignment
This chapter explores the technique of self-other overlap to enhance AI alignment by aligning a model's reasoning about itself and others. It discusses preliminary experimental findings, the benefits of scalability and low interpretability, and the challenges of maintaining distinctions between self and others.