Latent Space: The AI Engineer Podcast cover image

[Cognitive Revolution] The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research

Latent Space: The AI Engineer Podcast

NOTE

How to Maximize the Activation of a Neuron

When transformers become higher, deeper, or larger, they become more messy and less interpretable./nLarger transformers have a luxury of using multiple attention heads or layers to simulate tasks, which reduces the need for precision and conservatism./nInterpreting neurons in MLP (middle layers) is similar to interpreting attention heads, as they are just different coordinates in a vector space./nThe technique of identifying tokens that maximize neuron activation follows an idea from a 2015 paper called Visualizing and Understanding Neural Models in NLP./nWhen examining tokens in larger models like GPT-2 XL, no common meaning can be found.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner