The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

The Three Key Research Areas in AI Models

The speaker's research interests cover interpretability, model editing, and scalable oversight in AI models. Interpretability focuses on understanding the internal reasoning processes of language models to determine trustworthiness and generalizability. Model editing involves updating factual knowledge in language models, with applications like deletion of information. Scalable oversight aims to supervise and evaluate AI systems as they improve in task-solving capabilities, particularly focusing on safety measures.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner