Future of Life Institute Podcast cover image

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Future of Life Institute Podcast

CHAPTER

Toy Models of Superposition From Anthropic

A paper called toy models of superposition from anthropic is pretty exciting. They found that in a toy model they created not only did it learn to use superposition but that it learned these beautiful geometric configurations. And the final work I want to highlight is this work from opening eye called multimodal neurons in artificial neural networks. For example, a drawing of Spiderman, the name Peter Parker and like a picture ofSpiderman, the same neuron lights up,. So it suggests there's some real abstraction going on.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner