3min snip

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0 cover image

[Cognitive Revolution] The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

NOTE

How to Maximize the Activation of a Neuron

When transformers become higher, deeper, or larger, they become more messy and less interpretable./nLarger transformers have a luxury of using multiple attention heads or layers to simulate tasks, which reduces the need for precision and conservatism./nInterpreting neurons in MLP (middle layers) is similar to interpreting attention heads, as they are just different coordinates in a vector space./nThe technique of identifying tokens that maximize neuron activation follows an idea from a 2015 paper called Visualizing and Understanding Neural Models in NLP./nWhen examining tokens in larger models like GPT-2 XL, no common meaning can be found.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode