Deep Papers

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Jun 14, 2024
Delve into recent research on LLM interpretability with k-sparse autoencoders from OpenAI and sparse autoencoder scaling laws from Anthropic. Explore the implications for understanding neural activity and extracting interpretable features from language models.
Ask episode
Chapters
Transcript
Episode notes