#107 – Chris Olah on what the hell is going on inside neural networks

86 snips

Aug 4, 2021

In this engaging discussion, Chris Olah, a machine learning researcher known for his work on neural network interpretability, shares his insights into the complex world of AI. He breaks down how massive models can outperform humans in tasks ranging from diagnosing diseases to writing essays. The conversation delves into pressing issues like AI safety, bias in neural networks, and the emerging concept of 'emotion neurons.' Olah emphasizes the need for better interpretability tools and collaborative efforts to ensure responsible AI development.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Empirical Safety

Chris Olah prefers empirical safety research over theorizing.
He emphasizes understanding AI systems' inner workings to identify risks.

INSIGHT

Reverse Engineering AI

Neural networks achieve tasks humans can't program directly.
Chris studies how these models function, similar to biologists with alien organisms.

INSIGHT

Risks of Uninterpretable AI

Deploying AI in high-stakes situations without understanding is risky.
Testing is insufficient; understanding how AI might behave is crucial.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Big machine learning models can identify plant species better than any human, write passable essays, beat you at a game of Starcraft 2, figure out how a photo of Tobey Maguire and the word 'spider' are related, solve the 60-year-old 'protein folding problem', diagnose some diseases, play romantic matchmaker, write solid computer code, and offer questionable legal advice.

Humanity made these amazing and ever-improving tools. So how do our creations work? In short: we don't know.

Today's guest, Chris Olah, finds this both absurd and unacceptable. Over the last ten years he has been a leader in the effort to unravel what's really going on inside these black boxes. As part of that effort he helped create the famous DeepDream visualisations at Google Brain, reverse engineered the CLIP image classifier at OpenAI, and is now continuing his work at Anthropic, a new $100 million research company that tries to "co-develop the latest safety techniques alongside scaling of large ML models".

Links to learn more, summary and full transcript.

Despite having a huge fan base thanks to his explanations of ML and tweets, today's episode is the first long interview Chris has ever given. It features his personal take on what we've learned so far about what ML algorithms are doing, and what's next for this research agenda at Anthropic.

His decade of work has borne substantial fruit, producing an approach for looking inside the mess of connections in a neural network and back out what functional role each piece is serving. Among other things, Chris and team found that every visual classifier seems to converge on a number of simple common elements in their early layers — elements so fundamental they may exist in our own visual cortex in some form.

They also found networks developing 'multimodal neurons' that would trigger in response to the presence of high-level concepts like 'romance', across both images and text, mimicking the famous 'Halle Berry neuron' from human neuroscience.

While reverse engineering how a mind works would make any top-ten list of the most valuable knowledge to pursue for its own sake, Chris's work is also of urgent practical importance. Machine learning models are already being deployed in medicine, business, the military, and the justice system, in ever more powerful roles. The competitive pressure to put them into action as soon as they can turn a profit is great, and only getting greater.

But if we don't know what these machines are doing, we can't be confident they'll continue to work the way we want as circumstances change. Before we hand an algorithm the proverbial nuclear codes, we should demand more assurance than "well, it's always worked fine so far".

But by peering inside neural networks and figuring out how to 'read their minds' we can potentially foresee future failures and prevent them before they happen. Artificial neural networks may even be a better way to study how our own minds work, given that, unlike a human brain, we can see everything that's happening inside them — and having been posed similar challenges, there's every reason to think evolution and 'gradient descent' often converge on similar solutions.

Among other things, Rob and Chris cover:

• Why Chris thinks it's necessary to work with the largest models
• What fundamental lessons we've learned about how neural networks (and perhaps humans) think
• How interpretability research might help make AI safer to deploy, and Chris’ response to skeptics
• Why there's such a fuss about 'scaling laws' and what they say about future AI progress

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

Producer: Keiran Harris
Audio mastering: Ben Cordell
Transcriptions: Sofia Davis-Fogel