5min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

The Science of Deep Learning and Mechanistic Interpretability

There are two stories of why mechanistic interpretability could be useful. One is you want to do it so that you develop these mechanisticinterpretability tools. And the way you use them is one day you're going to train a model and you'll want to know whether it's like a good model or a bad model in terms of how it's thinking about stuff. So there's another story where you're like, okay, if I want to understand how like, I don't know what this, what value of squibble is going to be important. It can be a mistake to be too goal directed when trying to do basic science. Lots of things that get done will turn out

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode