3min chapter

80,000 Hours Podcast cover image

#107 – Chris Olah on what the hell is going on inside neural networks

80,000 Hours Podcast

CHAPTER

Is This Really Helping Us Make Better Predictions?

To me it does just seem like you're making specific claims about the functions of different circuits. It turns out it's actually very difficult to ask that question in a really rigorous way, because these functions are so non linear and complicated. And for thatyo, you might want to ask that more in ta question of, like, yots. Aget at a high level, you think this disagreement in parke's people just don't agree on what it is to understand your network,. What it is to have interpret ability.

00:00
Speaker 1
Ye,
Speaker 2
maybe i've just been influenced so much for i been reading, reading the articles you've contributed, writing and yeal framing of it all, but to me it does just seem like you're making specific claims about the functions of different circuits. And like, well, if you take this cluster of newer ends andwi and so on, this is what it does. And so it's, you're rightyou could say, well, shou should we look into this? It will depend o like, is this useful? Is this helping us make better predictions about stuff than we actually care about, in terms of wht what it does? But yet, ust just like you can look at, you know, a machine and say, well, this is what a gear does, simply, you can say that about about circuits. And like, having that level of understanding alisis, that's like one sense of interpret ability. Is, like, i look at this machine and i understand what each each part is contributing to othat seems very natural to me. Yes.
Speaker 1
And i would describe this as being like, mechanistic interprality. Or some people wold call it transparency, the sort of like we under or understanding this as a mechanism, or understanding, understanding what causes it to work. And, you know, you caun, you can have other other types of work that are maybe a little bit less focused on this. There's a lot of work on saling se maps, which try to highlight tht you have an image, classifye or what parts of the image we're important in going sifying, classifying in the final answer the model gave you. And for thatyo, you might want to ask that more in ta question of, like, yots. It turns out it's actually very difficult to serve, to ask that question in a really rigorous way, because these functions are so non linear and complicated. And for that it might make more sense to to ask it in this cosert of h c i type lens of, you know, is the explanation useful as it is it causing users to make more accurate predictions? Aget,
Speaker 2
so sens at a high level, you think this disagreement in parke's people just don't agree on what it is to understand ind your network, what it is to have interpret ability. And maybe over time, we should expect that with other fields before it, people will, well, maybe they'l thelan ther being multiple different conceptions of interpret ability, but people will understand them more crisply and say, whell we have like this sense, but not this sense? Or maybe people will converge on kind of a common, a common idea of what interpret ability is, that is that probably most y, the most useful one for the for the actual work theyoare trying to do.
Speaker 1
Yes, i think that's right. I think you could either have multiple fields formed, or often, at least according to tomas coon, one paradime will eventually, eventually win out. I guess that's what i find really interesting about coon's description of this, is i think that the thing that he sort of thinks is central to a paradime isn't, you know, like, the particular theuries that somebody has, or, like, see, you often, you, like, debelopt these different schools that hae diferent ways of thinking abot a problem like no, with electricity, we ended up with, there were multiple sort of schools that were like, more, some were more focused on, like, how charges repel, and some were more interested in current. An i think the current one sort of won out. And i think tocoon, the important thing isn't the, like, particular theories they had, but the phenomena they were choosing to pay attention to. And so from that lens, maybe the the thing that is sort of central to like, the the circuits paradime. O, going and approaching this, and i should say, o don't think i th iks like probably other work that sort of embodying a similar paradime. I don't want to claim it entirely for us. The core of that is paying attention to features and how they connect to each other, and having that sort of be the the phenomena that you focus on.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode