4min chapter

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

CHAPTER

Polysemanticity: A Theory of Neural Networks

"We are trying to engage with models as these high-dimensional objects in kind of this conceptual way so we need to be able to decompose them because of the curse of dimensionality" " polysemanticity is a behavioral observation of networks but when we look at neurons and look at things that activate them they're often activated by seemingly unrelated things like the uhs in the word strangers or capital letters of proper nouns and musicals about football. That's a particularly fun neural i found one time in a language model," he says. 'It's possible that actually we're missing some galaxy-brained abstraction where all of this is related'

00:00

Transcript

Episode notes

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

4min chapter

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights