
The Stack Overflow Podcast You’re probably underutilizing your GPUs
Nov 25, 2025
Jared Quincy Davis, CEO and co-founder of Mithril, dives into the fascinating world of GPU utilization. He challenges the notion of a GPU shortage, asserting that inefficiencies in resource allocation are the real issue. Jared discusses how multi-cloud orchestration can optimize GPU usage and why many clouds still rely on single-tenant allocations. He also explores the benefits of using older GPUs for certain workloads and advocates for a mixed fleet approach using specialized models. Get ready to rethink your GPU strategy!
AI Snips
Chapters
Transcript
Episode notes
GPU Shortage Is An Efficiency Issue
- The perceived GPU shortage is largely an efficiency and allocation problem, not absolute capacity scarcity.
- Jared Quincy Davis argues defensive buying and stranded capacity prevent elastic cloud-style sharing.
Large Models Need Contiguous GPU Topology
- Large model workloads require multi-node, contiguous GPU allocations and high-bandwidth interconnect.
- That contiguity constraint makes GPU scheduling more like Tetris than selling independent units.
Price Workloads, Not Hours
- Price workloads, not GPU hours, and reward flexibility with lower cost.
- Use extreme preemptability and priority-based auctions to route work and increase overall utilization.
