AI Safety Fundamentals: Alignment cover image

Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small

AI Safety Fundamentals: Alignment

CHAPTER

Exploring Circuits and Knockouts in Computational Models

Exploring the definition and function of circuits within computational models, including the concept of knockouts and the introduction of the mean-oblation method for analyzing node impacts in graphs.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner