Working Group Serving, with Yuan Tang and Eduardo Arango

11 snips

Oct 31, 2024

Yuan Tang is a principal software engineer at Red Hat, focusing on OpenShift AI, and is a leader in Kubernetes WG Serving. Eduardo Arango, a software engineer at NVIDIA, specializes in making Kubernetes suitable for high-performance computing. They delve into the challenges of AI model serving, discussing startup times and Kubernetes API limitations. The conversation also covers orchestration complexities for large language models and highlights innovative solutions like Model Mesh to optimize multi-host environments. Engagement and collaboration in Kubernetes working groups are urged for community-driven advancements.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

WG Serving's Focus

Working Group Serving (WG Serving) focuses on improving Kubernetes for AI/ML workloads, especially model serving.
This arose from community interest in better solutions for challenges like model pulling and hardware acceleration.

INSIGHT

AI Workloads: Training vs. Inferencing vs. Serving

AI workloads involve two main phases: training, where models learn from data, and inferencing, where models detect patterns.
Serving infrastructure ensures models are always available to respond to prompts and can scale to meet demand.

INSIGHT

WG Serving's Mission

WG Serving aims to improve Kubernetes' handling of compute-intensive inference tasks using specialized accelerators.
These improvements can benefit other workloads, and new primitives can be reused in projects like KServe, Kaito, and Ray.

Get the Snipd Podcast app to discover more snips from this episode

Get the app