AXRP - the AI X-risk Research Podcast

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

28 snips
Aug 24, 2024
In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.
Ask episode
Chapters
Transcript
Episode notes