
Localizing and Editing Knowledge in LLMs with Peter Hase - #679
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
The Three Key Research Areas in AI Models
The speaker's research interests cover interpretability, model editing, and scalable oversight in AI models. Interpretability focuses on understanding the internal reasoning processes of language models to determine trustworthiness and generalizability. Model editing involves updating factual knowledge in language models, with applications like deletion of information. Scalable oversight aims to supervise and evaluate AI systems as they improve in task-solving capabilities, particularly focusing on safety measures.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.