

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
28 snips Aug 24, 2024
In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11
Intro
00:00 • 2min
Understanding AI Interpretability
02:14 • 18min
Understanding Neural Representations
19:46 • 13min
Exploring Beliefs in Language Models
32:23 • 28min
Challenging Assumptions in Model Knowledge Editing
01:00:13 • 4min
Understanding Residual Layers and Information Flow in Language Models
01:04:31 • 5min
Understanding Belief Localization in Neural Networks
01:10:00 • 17min
Exploring Scalable Oversight and Its Connection to AI Research
01:27:22 • 5min
Navigating Supervision Gaps in AI Training
01:32:48 • 20min
Challenges of AI Alignment and Generalization
01:52:45 • 19min
Exploring Methodological Nuances in Research Interpretation
02:12:10 • 5min