
Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
00:00
Exploring Internal Fine-Tuning and Interpretability in Language Models
This chapter explores the internal fine-tuning of language models, contrasting it with user-driven custom fine-tuning. It focuses on the transition towards interpretability research and highlights surprising experimental findings, particularly in structured language generation and linguistic universality. The discussion delves into the complexities of circuit tracing, dictionary learning, and embeddings to better understand how these models process language and maintain coherence.
Transcript
Play full episode