
LessWrong (Curated & Popular) “EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper
May 24, 2024
The podcast delves into Anthropic's latest sparse autoencoder research, highlighting brilliant experiments, insights, and concerns about safety washing. The author reflects on predictions made about the paper's accomplishments, pointing out underperformance. Discussion also covers limitations of Anthropic's interpretability research, concerns about promotional strategies, and the need to prioritize a safety agenda.
Chapters
Transcript
Episode notes
