Future of Life Institute Podcast cover image

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Future of Life Institute Podcast

00:00

Toy Models of Superposition From Anthropic

A paper called toy models of superposition from anthropic is pretty exciting. They found that in a toy model they created not only did it learn to use superposition but that it learned these beautiful geometric configurations. And the final work I want to highlight is this work from opening eye called multimodal neurons in artificial neural networks. For example, a drawing of Spiderman, the name Peter Parker and like a picture ofSpiderman, the same neuron lights up,. So it suggests there's some real abstraction going on.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app