
Localizing and Editing Knowledge in LLMs with Peter Hase - #679
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Different Frames for Understanding Models' Behavior
The interpretation of what language models (LMS) are capable of doing varies based on the frame adopted. One frame suggests that LMS generate human-like text and are not expected to adhere to reasoning steps. Another frame proposes that the models represent truth or falsehood, where they can predict the next paragraph's likelihood and discern truth from falsehood. Furthermore, LMS are tested for consistency in their reasoning when faced with slightly different questions, resembling how humans maintain internal beliefs and justifications. Despite being perceived as solely predicting the next token, models can exhibit surprising consistency in some scenarios.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.