Future of Life Institute Podcast

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

12 snips

Feb 16, 2023

Ask episode

Chapters

Transcript

Episode notes

Why Should Anyone Care About Machine Learning?

Mechanistic Interpretability in Deep Learning - Multiple Results

Toy Models of Superposition From Anthropic

Are There Concepts That Humans Do Not Have That Could Be Found in Artificial Neural Networks?

How Promising Is Mechanistic Interpretability?

A Transformer Is a Sequence Modeling Thing

How to Predict What Comes After Apple?

Reverse Engineered Induction Heads Within Mechanistic Interpretability So Far

Reverse Engineering Induction Heads

Using Induction Heads in Artificial Intelligence Models

How Does Mechanistic Interpretability Help Reduce AI Risk?

Is Mechanistic Interpretability a Part of AI Safety?

Can Future Language Models Deceive Us?

Is Mechanistic Interpretability Not Fast Enough?

Could AIs Out-Compete Systems That Translate to Humans?

Is Mechanistic Interpretability Really Necessary?

How to Get Into Mechanterp?

Getting Into the Computer Science Field