4min chapter

AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Is This Path Analysis Going to Be Too Unwieldy to Be Useful?

The goal is not to study every path the goal is to find things we want to understand and look for the paths leading to those that matter. The model has been forced to map the 50 000 input tokens to this tiny say 500 dimensional bubble neck space and then back up to 50 000. It's presumably learned to compress this enormous table of stuff into something that can be done via a pretty narrow linear map. We don't try to interpret what the things in the 500 dimensional bottleneck mean we try to interpret the start and the end, he says. "We assume that the stuff in the middle is like some carefully compressed nonsense yeah"

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode