
Localizing and Editing Knowledge in LLMs with Peter Hase - #679
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
00:00
Interpreting AI Model Decisions
The reasoning behind AI model decisions may not be based on truth or internal world models, but rather on next token prediction, aligning with information in a layer without a fundamental connection. Different frames may lead to different interpretations of AI model explanations, with one suggesting they generate human-like text and another suggesting they represent what is true or false.
Transcript
Play full episode