The New Stack Podcast cover image

Why the CNCF's New Executive Director is Obsessed With Inference

The New Stack Podcast

00:00

Deploying Inference: Ray, KServe, GPUs

Jonathan explains common stacks (Ray, KServe) on Kubernetes and scaling inference across GPUs while maintaining context and cache.

Play episode from 05:36
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app