The Gradient: Perspectives on AI

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

16 snips

Aug 26, 2022

Catherine Olsson

Ask episode

Chapters

Transcript

Episode notes

Anthropic and Interpretability - What's Next?

Is Mechanistic Interpretability a Good Idea?

You've Got a Program on Mechanistic Interpretability

The OV Matrix and the Path Expansion Frame

Toy Transformers - How Do You Even Try to Do This?

The Basics of Attention Only Models

What Are Induction Heads?

Induction Heads

What Is in Context Learning and Induction Heads?

The Intact Context Learning Score

Induction Heads in Larger Models - The Key Takeaway

Getting Traction in the MLP Layers

How to Replicate the Induction Heads Findings

The Gradient Podcast - Part 2