The Nonlinear Library: LessWrong

LW - Anthropic announces interpretability advances. How much does this advance alignment? by Seth Herd

May 22, 2024
Ask episode
Chapters
Transcript
Episode notes