NVIDIA AI Podcast

Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284

13 snips
Dec 29, 2025
Ian Buck, NVIDIA's VP of Hyperscale and High-Performance Computing, dives into the fascinating world of mixture-of-experts (MoE) architecture. He explains how MoE allows smarter AI models to operate with reduced compute costs by activating only necessary components. Buck highlights the importance of NVLink for maximizing performance and discusses the trade-offs between using MoE versus smaller models. Moreover, he envisions MoE's potential across various domains beyond just language, making a compelling case for its future impact.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

MoE Lets Models Be Larger Yet Cheaper

  • Mixture‑of‑experts (MoE) splits a large model into smaller expert modules and only activates the needed ones for each token.
  • That sparsity lets models be massively larger while making per‑token computation much cheaper.
INSIGHT

Sparser Activation Boosts Performance Per Cost

  • MoE models can achieve much higher benchmark scores while activating far fewer parameters per query.
  • That yields big intelligence gains at substantially lower token cost versus fully dense large models.
ANECDOTE

DeepSeq's Open Win Sparked The MOE Shift

  • DeepSeq (DeepSeek) demonstrated a world‑class MoE implementation and publicly catalyzed the MOE movement.
  • Its open paper and extreme optimizations convinced many researchers to adopt MoE architectures.
Get the Snipd Podcast app to discover more snips from this episode
Get the app