3min chapter

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis cover image

E48: Mechanizing Mechanistic Interpretability with Arthur Conmy

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

CHAPTER

ACDC's Automatic Circuit Discovery

ACDC is a three-step algorithm that imitates the human process for trying to interpret neural networks but does this just via like a software rather than requiring a human in the loop. It looks at all the like input edges to the node in that graph and one by one like removing them by setting their likeactivation to the activation on the baseline data set. Then measures whether setting the activation along this particular edge decreases the like models performance on the downstream metric by a given like amount. If it didn't seem to matter at all we can remove this edge and that's like the step two which we then just recurse in the third step through all the nodes so that's high level overview of

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode