Kubernetes Podcast from Google

Working Group Serving, with Yuan Tang and Eduardo Arango

11 snips
Oct 31, 2024
Yuan Tang is a principal software engineer at Red Hat, focusing on OpenShift AI, and is a leader in Kubernetes WG Serving. Eduardo Arango, a software engineer at NVIDIA, specializes in making Kubernetes suitable for high-performance computing. They delve into the challenges of AI model serving, discussing startup times and Kubernetes API limitations. The conversation also covers orchestration complexities for large language models and highlights innovative solutions like Model Mesh to optimize multi-host environments. Engagement and collaboration in Kubernetes working groups are urged for community-driven advancements.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

WG Serving's Focus

  • Working Group Serving (WG Serving) focuses on improving Kubernetes for AI/ML workloads, especially model serving.
  • This arose from community interest in better solutions for challenges like model pulling and hardware acceleration.
INSIGHT

AI Workloads: Training vs. Inferencing vs. Serving

  • AI workloads involve two main phases: training, where models learn from data, and inferencing, where models detect patterns.
  • Serving infrastructure ensures models are always available to respond to prompts and can scale to meet demand.
INSIGHT

WG Serving's Mission

  • WG Serving aims to improve Kubernetes' handling of compute-intensive inference tasks using specialized accelerators.
  • These improvements can benefit other workloads, and new primitives can be reused in projects like KServe, Kaito, and Ray.
Get the Snipd Podcast app to discover more snips from this episode
Get the app