You’re probably underutilizing your GPUs

Nov 25, 2025

Jared Quincy Davis, CEO and co-founder of Mithril, dives into the fascinating world of GPU utilization. He challenges the notion of a GPU shortage, asserting that inefficiencies in resource allocation are the real issue. Jared discusses how multi-cloud orchestration can optimize GPU usage and why many clouds still rely on single-tenant allocations. He also explores the benefits of using older GPUs for certain workloads and advocates for a mixed fleet approach using specialized models. Get ready to rethink your GPU strategy!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

GPU Shortage Is An Efficiency Issue

The perceived GPU shortage is largely an efficiency and allocation problem, not absolute capacity scarcity.
Jared Quincy Davis argues defensive buying and stranded capacity prevent elastic cloud-style sharing.

INSIGHT

Large Models Need Contiguous GPU Topology

Large model workloads require multi-node, contiguous GPU allocations and high-bandwidth interconnect.
That contiguity constraint makes GPU scheduling more like Tetris than selling independent units.

ADVICE

Price Workloads, Not Hours

Price workloads, not GPU hours, and reward flexibility with lower cost.
Use extreme preemptability and priority-based auctions to route work and increase overall utilization.

Get the Snipd Podcast app to discover more snips from this episode

Get the app