3min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Activation Patching Is a Great Way to Find Out What a Neuron Does

activation patching is a way to flip the answer from Rome to Paris. It turns out that most things don't matter some things matter a ton and that just patching in a single activation can often be enough to like significantly flip things. There are also techniques that are much more kind of suggestive where it gives some evidence i think it should be part of a talket you use with heavy caution one example of this is a really dumb technique figuring out what a neuron does  you look at its max activating data set examples yep they're all recipes.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode