Future of Life Institute Podcast cover image

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

Future of Life Institute Podcast

00:00

Toy Models of Superposition From Anthropic

A paper called toy models of superposition from anthropic is pretty exciting. They found that in a toy model they created not only did it learn to use superposition but that it learned these beautiful geometric configurations. And the final work I want to highlight is this work from opening eye called multimodal neurons in artificial neural networks. For example, a drawing of Spiderman, the name Peter Parker and like a picture ofSpiderman, the same neuron lights up,. So it suggests there's some real abstraction going on.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app