The Stack Overflow Podcast

You’re probably underutilizing your GPUs

Nov 25, 2025
Jared Quincy Davis, CEO and co-founder of Mithril, dives into the fascinating world of GPU utilization. He challenges the notion of a GPU shortage, asserting that inefficiencies in resource allocation are the real issue. Jared discusses how multi-cloud orchestration can optimize GPU usage and why many clouds still rely on single-tenant allocations. He also explores the benefits of using older GPUs for certain workloads and advocates for a mixed fleet approach using specialized models. Get ready to rethink your GPU strategy!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

GPU Shortage Is An Efficiency Issue

  • The perceived GPU shortage is largely an efficiency and allocation problem, not absolute capacity scarcity.
  • Jared Quincy Davis argues defensive buying and stranded capacity prevent elastic cloud-style sharing.
INSIGHT

Large Models Need Contiguous GPU Topology

  • Large model workloads require multi-node, contiguous GPU allocations and high-bandwidth interconnect.
  • That contiguity constraint makes GPU scheduling more like Tetris than selling independent units.
ADVICE

Price Workloads, Not Hours

  • Price workloads, not GPU hours, and reward flexibility with lower cost.
  • Use extreme preemptability and priority-based auctions to route work and increase overall utilization.
Get the Snipd Podcast app to discover more snips from this episode
Get the app