

Localizing and Editing Knowledge in LLMs with Peter Hase - #679
86 snips Apr 8, 2024
Peter Hase, a fifth-year PhD student at the University of North Carolina NLP lab, dives into the fascinating world of large language models. He discusses the vital role of interpretability in AI, exploring how knowledge is stored and accessed. The conversation shifts to model editing, emphasizing the challenges of deleting sensitive information while maintaining data integrity. Hase also highlights the risks of easy-to-hard generalization in releasing open-source models and the impact of instructional prompts on model performance. This insightful dialogue unravels complexities in AI decision-making.
AI Snips
Chapters
Transcript
Episode notes
Fact Storage in LLMs
- Language models don't store facts in isolated locations like computer memory.
- Residual layers allow information to flow throughout the network, making editing complex.
Distributed Representations
- The traditional view of localized knowledge storage might not fully apply to LLMs.
- Information is likely distributed, with residual layers playing a key role.
Causal Tracing
- Causal tracing, a denoising technique, helps locate information within the model.
- It identifies components sufficient for arriving at original answers by substituting clean representations.