
The New Stack Podcast Kubernetes GPU Management Just Got a Major Upgrade
Dec 11, 2025
Kevin Klues, a distinguished engineer at NVIDIA, and Jesse Butler, a principal product manager at AWS, explore cutting-edge advancements in Kubernetes for AI. They discuss the transformative Dynamic Resource Allocation (DRA), which allows users to specify GPU types and configurations, enhancing GPU usability. Klues highlights an upcoming workload abstraction aimed at optimizing complex AI workloads, ensuring coordinated multi-node jobs. They emphasize community involvement in shaping Kubernetes’ AI capabilities and the importance of balancing efficiency with system complexity.
AI Snips
Chapters
Transcript
Episode notes
Hardware Triggered Kevin's AI Moment
- Kevin's 'aha' came when GPU hardware matured enough to properly accelerate long-standing algorithms.
- He credits hardware advances, not just algorithms, for mainstream AI momentum.
DRA Reimagines Hardware Allocation
- Dynamic Resource Allocation (DRA) models hardware allocation after Kubernetes storage primitives like PVCs to expose specialized devices cleanly.
- DRA lets users request device types and configurations rather than just a GPU count, shifting complexity into scheduler and drivers.
Adopt DRA Instead Of Custom Controllers
- Use DRA when available so you avoid writing custom controllers and CRDs to claim specialized hardware.
- Follow cloud vendor guides (e.g., AWS blog) to integrate DRA in a few YAML lines instead of reinventing scheduling logic.
