
Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability
Future of Life Institute Podcast
Reverse Engineering Induction Heads
induction heads are a general algorithm that can work on arbitrary words, but which does this very specific task. They appear in all the open source models I could find and they're really important for what we call in context learning. The ability of models to do this perfectly coincides with the dramatic bit where they're learnt.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.